Skip to content

Workflows

About workflows

Workflows allow to run any machine learning tasks in your cloud account via the CLI. These tasks can include preparing data, training models, and running applications. When you run a workflow, dstack automatically provisions the required infrastructure, dependencies, and tracks output artifacts.

Running workflows

To run a workflow, you can either pass all its arguments via the CLI directly, or define them in the .dstack/workflows.yaml file, and run the workflow by name.

Example

dstack run python train.py -r requirements.txt -a model \
  --gpu 4 --gpu-name K80 

Make sure to use the CLI from the project repository directory.

NOTE:

As long as your project is under Git, you don't have to commit local changes before using the run command. dstack tracks local changes automatically and allows you to see them in the user interface for every run.

Also, you can define your workflow in the .dstack/workflows.yaml file and run it by name:

workflows:
  - name: train
    provider: python
    file: "train.py"
    requirements: "requirements.txt"
    artifacts: ["model"]
    resources:
      gpu:
        name: "K80"
        count: 4
dstack run train 

Providers

The provider argument defines how the workflow is executed.

dstack offers a variety of built-in providers that allow you to run any machine learning task, deploy an application, or launch a dev environment.

Every provider may have its own arguments. For example, with the python provider, we can pass file (the file to run), requirements (the file with requirements), artifacts (what folders) to save as output artifacts, and resources (what hardware resources are required to run the workflow, e.g. GPU, memory, etc).

To learn more about the run command, check out the Providers Reference.

Workflows file syntax

Let's walk through the syntax of the .dstack/workflows.yaml file:

workflows:
  - name: download
    help: "Downloads the training data" 
    provider: python
    file: "download.py"
    artifacts: ["data"]

  - name: train
    help: "Trains a model and saves the checkpoints"
    depends-on:
      - download:latest
    provider: python
    file: "train.py"
    artifacts: ["model"]
    resources:
      gpu: 1

Dependencies

In the example above, you can notice that the train workflow has a depends-on argument. This argument defines dependencies to other workflows.

For example, if you want dstack to run the download workflow before the train workflow, you can use the following syntax:

depends-on:
  - download 

If you run the train workflow, dstack will run both the download and the train workflows. The output artifacts of the download workflow will be passed to the train workflow.

In case you want to run the download workflow each time you run the train workflow, and instead would like to reuse the output artifacts of a particular run of the download workflow, you can refer to that run via a tag:

depends-on:
  - download:<tag-name>

Tags can be assigned to finished runs via the CLI or the user interface. Tags allow to version output artifacts for later reuse.

Logs

The output of running workflows is tracked in real-time and can be accessed through the user interface or the CLI.

To access the output through the CLI, use the following command:

dstack logs <run-name>

If you'd like to see the output in real-time through the CLI, add the -f (or --follow) argument:

dstack logs <run-name> -f

NOTE:

Make sure you don't print experiment metrics to the output.

Instead, it's recommended that you use specialized tools such as WandB, Comet, Neptune, etc.

Artifacts

By default, the output artifacts are tracked in real-time and can be accessed either via the user interface or the CLI.

To browse artifacts through the CLI, use the following command:

dstack artifacts list <run-name>

To download artifacts locally, use the following command:

dstack artifacts download <run-name>

Secrets

If you plan to use third-party services from your workflows, you can use dstack's secrets to securely pass passwords and tokens.

Secrets can be configured on the Settings page in the user interface.

The configured secrets are passed to the workflows as environment variables.

Here's an example of how you can access them from Python:

import os

wandb_api_key = os.environ.get("WANDB_API_KEY")