Concepts¶
dstack
allows YAML-defined ML workflows to be run locally or remotely in any configured cloud accounts via CLI.
Remotes¶
By default, workflows run locally. To run workflows remotely, you need to first configure a remote using the dstack
config
command. Once a remote is configured, use the --remote
flag with the dstack run
command to run a workflow in
the remote.
NOTE:
When running a workflow remotely, dstack
automatically creates and
destroys cloud instances based on resource requirements and cost strategy, such as using spot instances.
Remotes facilitate collaboration as they allow multiple team members to access the same runs.
Workflows¶
Workflows can be scripts for data preparation or model training, web apps like Streamlit or Gradio, or development environments like JupyterLab or VS Code.
Here's an example from the Quick start:
workflows:
- name: mnist-data
provider: bash
commands:
- pip install torchvision
- python mnist/mnist_data.py
artifacts:
- path: ./data
- name: train-mnist
provider: bash
deps:
- workflow: mnist-data
commands:
- pip install torchvision pytorch-lightning tensorboard
- python mnist/train_mnist.py
artifacts:
- path: ./lightning_logs
YAML-defined workflows eliminate the need to modify code in your scripts, giving you the freedom to choose frameworks, experiment trackers, and cloud providers.
NOTE:
Workflows run in containers with pre-configured Conda environments, and CUDA drivers.
Artifacts¶
When running a workflow locally, the artifacts are stored in ~/.dstack/artifacts
and can only be reused from workflows
that also run locally. To reuse the artifacts remotely, you must push them using the dstack push
command.
When running a workflow remotely, the resulting artifacts are automatically stored remotely. If you want to access the
artifacts of a remote workflow locally, you can use the dstack pull
command.
To conveniently refer to the artifacts of a particular run, you can assign a tag to it using
the dstack tags
command.
CLI¶
The dstack CLI provides various functionalities such as running workflows, accessing logs, artifacts, and stopping runs, among others.
$ dstack
Usage: dstack [-v] [-h] COMMAND ...
Positional Arguments:
COMMAND
config Configure the remote backend
cp Copy artifact files to a local target path
init Authorize dstack to access the current Git repo
logs Show logs
ls List artifacts
ps List runs
pull Pull artifacts of a remote run
push Push artifacts of a local run
rm Remove run(s)
run Run a workflow
secrets Manage secrets
stop Stop run(s)
tags Manage tags
Optional Arguments:
-v, --version Show dstack version
-h, --help Show this help message and exit
Run dstack COMMAND --help for more information on a particular command
Why dstack?¶
dstack
enables you to define ML workflows declaratively and run them effortlessly
from your preferred IDE either locally or remotely on any cloud.
Unlike end-to-end MLOps platforms, dstack
is lightweight, developer-friendly, and designed to facilitate collaboration
without imposing any particular approach.