What is dstack?
dstack is the modern CI/CD made for training models
dstack allows you to automate training workflows, version and reuse data and models using a cloud vendor of your choice.
Principles
π€ Infrastructure as code
Typical data and training workflows deal with processing huge amounts of data. They typically involve piping together numerous tasks that may have different hardware requirements.
dstack allows you to define workflows and infrastructure requirements as code using declarative configuration files. When you run a workflow, dstack provisions the required infrastructure on-demand.
When defining a workflow, you can either use the built-in providers (that support specific use-cases), or create custom providers for specific use-cases using dstack's Python API.
𧬠Made for continuous training
Training models doesn't end when you ship your model to production. It only starts there. Once your model is deployed, itβs critical to observe the model, back-track issues that occur to the model to the steps of the training pipeline, fix these issues, re-train on new data, validate, and re-deploy your model.
dstack allows you to build a pipeline that can run on a regular basis.
π€ Designed for collaboration and reuse
dstack allows you to collaborate in multiple ways. On the one hand, the outputs of workflows, such as data and models can be tagged and reused in other workflows within your team or across. On the other hand, it's possible to reuse the providers built by other teams or by the community.
πͺ Technology-agnostic
With dstack, you can use any languages (Python
, R
, Scala
, or any other), any frameworks (including the distributed
frameworks, such as Dask
, Ray
, Spark
, Tensorflow
, PyTorch
, and any others), any experiment trackers,
any computing vendors or your own hardware.
Quick tour
𧬠Workflows
Configuration files
Workflows are defined in the .dstack/workflows.yaml
file within your project.
If you plan to pass variables to your workflows when you run them, you have to describe these variables in the
.dstack/variables.yaml
file, next to workflows.
workflows:
- name: prepare
provider: python
script: prepare.py
artifacts:
- data
resources:
gpu: ${{ pgpu }}
variables:
prepare:
pgpu: 1
Command-line interface
To run this Workflow
, use the following command of the dstack CLI
:
dstack run prepare --pgpu 4
Once you do that, you'll see this run in the user interface. Shortly, dstack will assign it to one of the available runners or to a runner provisioned from a computing vendor that is configured for your account.
Tags
When the run is completed, you can assign a tag to it, e.g. latest
.
If you do that, you later can refer to this tagged workflow from other workflows:
workflows:
- name: prepare
provider: python
script: prepare.py
artifacts:
- data
resources:
gpu: ${{ pgpu }}
- name: train
provider: python
script: train.py
artifacts:
- checkpoint
depends-on:
- prepare:latest
resources:
gpu: ${{ tgpu }}
variables:
prepare:
pgpu: 1
train:
tgpu: 1
When you run the train
workflow, dstack will mount to it the data
folder produced by the prepare:latest
.
π€ Runners
There are two ways to provision infrastructure: by using on-demand or self-hosted runners.
On-demand runners
To use on-demand runners, go to the Settings
, then AWS
, provide your credentials, and configure limits:
Once you configure these limits, runners will be provisioned automatically for the time of the run.
Self-hosted runners
As an alternative to on-demand runners, you can use your own hardware to run workflows.
To use your own server with dstack, you need to install the dstack-runner
daemon there:
curl -fsSL https://get.dstack.ai/runner -o get-dstack-runner.sh
sudo sh get-dstack-runner.sh