Pipeline
Orchestrates ephemeral Vast.ai GPU instances: searches for an offer, creates the instance, syncs the project, trains, downloads outputs/, and destroys the instance automatically. Generator runs also rsync generator/outputs/ every 50 epochs while training is still running.
One-time setup
Create pipeline/.env:
VAST_API_KEY=your-vast-api-key
VAST_SSH_PRIVATE_KEY=/home/you/.ssh/id_ed25519 # optional, this is the default
The matching .pub file must exist alongside the private key. The pipeline registers it with Vast.ai automatically if it isn't there yet.
Commands
run — train on a remote GPU and fetch results
python -m pipeline run <config...> [options]
Accepts one or more config paths, or a single directory (all *.json inside, sorted). Duplicate configs (identical training settings after resolving extends and shared.json) are skipped automatically.
| Flag | Default | Description |
|---|---|---|
configs |
(required) | One or more config paths, or a directory of JSON configs |
--download-data |
off | Download the DFF dataset via HuggingFace on the remote before training |
--send-cropped |
off | Rsync local cropped/{classifier,generator}/ to remote (picks subdirectory based on config) |
--select-offer |
off | Interactively browse and pick the GPU offer |
--sort |
config | Ranking mode: price, performance, or dlp_per_dollar |
--region TEXT |
any | Filter by region, e.g. europe, Portugal, US |
--price FLOAT |
config | Max hourly price cap in USD |
--dry-run |
off | Print matching offers without creating an instance |
--keep-on-failure |
off | Do not destroy the instance if training fails |
--no-gpu |
off | Disable GPU training on remote (use CPU instead) |
--select-template |
off | Interactively choose a Vast.ai Docker template |
--template HASH |
config | Use a specific template hash ID |
--pipeline-config PATH |
none | JSON file that overrides pipeline/defaults/vast.json |
Examples:
# Cheapest available RTX 3090 in Europe, download data on remote
python -m pipeline run configs/resnet18.json --region europe --download-data
# Browse offers interactively, sort by price
python -m pipeline run configs/resnet18.json --select-offer --sort price
# Run all configs in a directory sequentially on one instance
python -m pipeline run configs/phase2/ --region europe
# See what offers would be selected without spending money
python -m pipeline run configs/resnet18.json --dry-run --region europe
# Keep the instance alive if something goes wrong (for debugging)
python -m pipeline run configs/resnet18.json --keep-on-failure
# Cap price at $0.12/h
python -m pipeline run configs/resnet18.json --price 0.12
offers — inspect available GPU offers
python -m pipeline offers [options]
| Flag | Default | Description |
|---|---|---|
--sort |
config | Ranking mode: price, performance, or dlp_per_dollar |
--region TEXT |
any | Region filter |
--price FLOAT |
config | Max hourly price cap |
--select-offer |
off | Interactive offer picker (prints the selected offer as JSON) |
--list-regions |
off | Print a count of available offers per region and exit |
--limit-output INT |
10 | How many offers to print |
--pipeline-config PATH |
none | Pipeline config override |
Examples:
# See the 20 best-value offers under $0.15/h in Europe
python -m pipeline offers --region europe --price 0.15 --limit-output 20
# List which regions have matching GPUs
python -m pipeline offers --list-regions
# Interactive picker — useful before committing to a run
python -m pipeline offers --select-offer --sort price
up — create an instance without training
Spins up an instance and prints SSH connection details. Useful for manual experiments or debugging.
python -m pipeline up [options]
| Flag | Default | Description |
|---|---|---|
--label TEXT |
auto | Optional label for the instance |
--select-template |
off | Interactively choose a Vast.ai Docker template |
--template HASH |
config | Use a specific template hash ID |
--pipeline-config PATH |
none | Pipeline config override |
python -m pipeline up
python -m pipeline up --label my-debug-session
status — show instance details
python -m pipeline status <instance_id> [--pipeline-config PATH]
down — destroy an instance
python -m pipeline down <instance_id> [--pipeline-config PATH]
Pipeline config overrides
Pass --pipeline-config my_overrides.json to override any field from pipeline/defaults/vast.json. Only the fields you specify are changed; the rest keep their defaults (deep-merged). Useful for switching GPU types or raising the price cap for a single run without editing defaults.
Example — allow RTX 4090, higher price cap:
{
"search": {
"gpu_names": ["RTX 4090"],
"max_dph_total": 0.45
}
}
Key fields in pipeline/defaults/vast.json:
| Section | Key | Default | Meaning |
|---|---|---|---|
search |
gpu_names |
["RTX 3090", "RTX 3090 Ti"] |
Accepted GPU models |
search |
max_dph_total |
0.40 |
Max price per hour |
search |
sort_mode |
"dlp_per_dollar" |
Default ranking (price, performance, or dlp_per_dollar) |
search |
min_reliability |
0.98 |
Minimum host reliability score |
instance |
disk_gb |
48 |
Disk size provisioned on the instance |
instance |
image |
"vastai/pytorch:latest" |
Docker image |
remote |
workspace_dir |
"/workspace/DRL_PROJ" |
Remote working directory |
remote |
ssh_timeout_seconds |
900 |
How long to wait for SSH to become available |
Full workflow example
# 1. Check what's available and how much it costs
python -m pipeline offers --region europe --list-regions
python -m pipeline offers --region europe --sort price --limit-output 20
# 2. Run training (auto-selects best offer, downloads data if needed)
python -m pipeline run configs/resnet18.json --region europe --download-data
# 3. Results land in classifier/outputs/ automatically