Workflows
About workflows
Workflows allow to run any machine learning tasks in your cloud account via the CLI. These tasks can include preparing data, training models, and running applications. When you run a workflow, dstack automatically provisions the required infrastructure, dependencies, and tracks output artifacts.
Running workflows
To run a workflow, you can either pass all its arguments via the CLI directly, or define them
in the .dstack/workflows.yaml
file, and run the workflow by name.
Example
dstack run python train.py -r requirements.txt -a model \
--gpu 4 --gpu-name K80
Make sure to use the CLI from the project repository directory.
NOTE:
As long as your project is under Git, you don't have to commit local changes before using the run command. dstack tracks local changes automatically and allows you to see them in the user interface for every run.
Also, you can define your workflow in the .dstack/workflows.yaml
file and run it by name:
workflows:
- name: train
provider: python
file: "train.py"
requirements: "requirements.txt"
artifacts: ["model"]
resources:
gpu:
name: "K80"
count: 4
dstack run train
Providers
The provider
argument defines how the workflow is executed.
dstack offers a variety of built-in providers that allow you to run any machine learning task, deploy an application, or launch a dev environment.
Every provider may have its own arguments.
For example, with the python
provider, we can pass file
(the file to run),
requirements
(the file with requirements), artifacts
(what folders) to save as output artifacts,
and resources
(what hardware resources are required to run the workflow, e.g. GPU, memory, etc).
To learn more about the run
command, check out the Providers Reference.
Workflows file syntax
Let's walk through the syntax of the .dstack/workflows.yaml
file:
workflows:
- name: download
help: "Downloads the training data"
provider: python
file: "download.py"
artifacts: ["data"]
- name: train
help: "Trains a model and saves the checkpoints"
depends-on:
- download:latest
provider: python
file: "train.py"
artifacts: ["model"]
resources:
gpu: 1
Dependencies
In the example above, you can notice that the train
workflow has a depends-on
argument.
This argument defines dependencies to other workflows.
For example, if you want dstack to run the download
workflow before the train
workflow,
you can use the following syntax:
depends-on:
- download
If you run the train
workflow, dstack will run both the download
and the train
workflows.
The output artifacts of the download
workflow will be passed to the
train
workflow.
In case you want to run the download
workflow each time you run the train
workflow,
and instead would like to reuse the output artifacts of a particular run of the download
workflow,
you can refer to that run via a tag:
depends-on:
- download:<tag-name>
Tags can be assigned to finished runs via the CLI or the user interface. Tags allow to version output artifacts for later reuse.
Logs
The output of running workflows is tracked in real-time and can be accessed through the user interface or the CLI.
To access the output through the CLI, use the following command:
dstack logs <run-name>
If you'd like to see the output in real-time through the CLI, add the -f
(or --follow
) argument:
dstack logs <run-name> -f
NOTE:
Make sure you don't print experiment metrics to the output.
Instead, it's recommended that you use specialized tools such as WandB, Comet, Neptune, etc.
Artifacts
By default, the output artifacts are tracked in real-time and can be accessed either via the user interface or the CLI.
To browse artifacts through the CLI, use the following command:
dstack artifacts list <run-name>
To download artifacts locally, use the following command:
dstack artifacts download <run-name>
Secrets
If you plan to use third-party services from your workflows, you can use dstack's secrets to securely pass passwords and tokens.
Secrets can be configured on the Settings
page in the user interface.
The configured secrets are passed to the workflows as environment variables.
Here's an example of how you can access them from Python:
import os
wandb_api_key = os.environ.get("WANDB_API_KEY")