Skip to content

Quick start

This tutorial will guide you through using dstack locally and remotely, step by step.

NOTE:

The source code for this tutorial can be located in github.com/dstack-examples.

1. Install the CLI

Use pip to install dstack:

$ pip install dstack --upgrade

2. Create a repo

To use dstack, your project must be managed by Git and have at least one remote branch configured. Your repository can be hosted on GitHub, GitLab, BitBucket, or any other platform.

Set up a remote branch

If you haven't set up a remote branch in your repo yet, here's how you can do it:

$ echo "# Quick start" >> README.md
$ git init
$ git add README.md
$ git commit -m "first commit"
$ git branch -M main
$ git remote add origin "<your remote repo URL>"
$ git push -u origin main

Init the repo

Once you've set up a remote branch in your repo, go ahead and run this command:

$ dstack init

It will set up the repo to work with dstack.

Now that everything is in place, you can use dstack with your project.

3. Prepare data

Let us begin by creating a Python script that will prepare the data for our training script.

Create a Python script

Let us create the following Python script:

from torchvision.datasets import MNIST

if __name__ == '__main__':
    # Download train data
    MNIST("./data", train=True, download=True)
    # Download test data
    MNIST("./data", train=False, download=True)

This script downloads the MNIST dataset and saves it locally to the data folder.

To run the script via dstack, it must be defined as a workflow in a YAML file in the .dstack/workflows folder within the repo.

Create a workflow YAML file

Define the mnist-data workflow by creating the following YAML file:

workflows:
  - name: mnist-data
    provider: bash
    commands:
      - pip install torchvision
      - python tutorials/mnist/mnist_data.py
    artifacts:
      - path: ./data

NOTE:

In order for the files to be available in a workflow, they have to be tracked by Git. To ensure Git tracks the files, run:

$ git add .dstack tutorials

After that, dstack will keep track of the file changes automatically, so you don't have to run git add on every change.

Run the workflow locally

Now you can run the defined workflow using the dstack run command:

$ dstack run mnist-data

RUN             WORKFLOW    SUBMITTED  STATUS     TAG  BACKEND
zebra-1         mnist-data  now        Submitted       local

Provisioning... It may take up to a minute. ✓

To interrupt, press Ctrl+C.

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
---> 100%

$

NOTE:

By default, dstack runs workflows locally, which requires having either Docker or NVIDIA Docker installed locally.

NOTE:

By default, dstack uses the same Python version to run workflows as your local Python version. If you use Python 3.11, the mnist-data workflow will fail since it's not yet supported by torchvision. To solve such problems, dstack allows you to specify a Python version for the workflow with the python parameter in the YAML file, e.g. python: 3.9.

Check status

To check the status of recent runs, use the dstack ps command:

$ dstack ps

RUN      WORKFLOW    SUBMITTED  STATUS     TAG  BACKEND
zebra-1  mnist-data  now        Submitted       local

This command displays either the currently running workflows or the last completed run. To see all runs, use the dstack ps -a command.

List artifacts

Once a run is finished, its artifacts are saved and can be reused.

You can list artifacts of any run using the dstack ls command:

$ dstack ls zebra-1

PATH  FILE                                  SIZE
data  MNIST/raw/t10k-images-idx3-ubyte      7.5MiB
      MNIST/raw/t10k-images-idx3-ubyte.gz   1.6MiB
      MNIST/raw/t10k-labels-idx1-ubyte      9.8KiB
      MNIST/raw/t10k-labels-idx1-ubyte.gz   4.4KiB
      MNIST/raw/train-images-idx3-ubyte     44.9MiB
      MNIST/raw/train-images-idx3-ubyte.gz  9.5MiB
      MNIST/raw/train-labels-idx1-ubyte     58.6KiB
      MNIST/raw/train-labels-idx1-ubyte.gz  28.2KiB

This will display all the files and their sizes.

4. Train a model

Now, that the data is prepared, let's create a Python script to train a model.

Create a Python script

Let us create the following training script:

import torch
from pytorch_lightning import LightningModule, Trainer
from pytorch_lightning.callbacks.progress import TQDMProgressBar
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST


class MNISTModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_nb):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)


BATCH_SIZE = 256 if torch.cuda.is_available() else 64

if __name__ == "__main__":
    # Init our model
    mnist_model = MNISTModel()

    # Init DataLoader from MNIST Dataset
    train_ds = MNIST("./data", train=True, download=False, transform=transforms.ToTensor())
    train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE)

    # Initialize a trainer
    trainer = Trainer(
        accelerator="auto",
        devices=1 if torch.cuda.is_available() else None,  # limiting got iPython runs
        max_epochs=3,
        callbacks=[TQDMProgressBar(refresh_rate=20)],
    )

    # Train the model ⚡
    trainer.fit(mnist_model, train_loader)

This script trains a model using the MNIST dataset from the local data folder.

Update the workflow YAML file

Ddd the train-mnist workflow to the workflow YAML file:

workflows:
  - name: mnist-data
    provider: bash
    commands:
      - pip install torchvision
      - python tutorials/mnist/mnist_data.py
    artifacts:
      - path: ./data

  - name: train-mnist
    provider: bash
    deps:
      - workflow: mnist-data 
    commands:
      - pip install torchvision pytorch-lightning tensorboard
      - python tutorials/mnist/train_mnist.py
    artifacts:
      - path: ./lightning_logs

To reuse data across workflows, we made the train-mnist workflow dependent on the mnist-data workflow. When we run train-mnist, dstack will automatically put the data from the last mnist-data run in the data folder.

Run the workflow locally

Now you can run the defined workflow using the dstack run command:

$ dstack run train-mnist

RUN        WORKFLOW     SUBMITTED  STATUS     TAG 
mangust-2  train-mnist  now        Submitted  

Povisioning... It may take up to a minute. ✓

To interrupt, press Ctrl+C.

Epoch 1: [00:03<00:00, 280.17it/s, loss=1.35, v_num=0]
---> 100%

$

5. Configure the remote

When you run a workflow locally, artifacts are stored in ~/.dstack/artifacts and can only be reused by workflows that also run locally.

To run workflows remotely or enable artifact reuse outside of your machine, you can configure a remote using the dstack config command.

See Installation to learn more about supported remote types and how to configure them.

6. Push artifacts

To reuse the artifacts of the mnist-data workflow outside your machine, you can use the dstack push command to upload them to the configured remote (e.g. the cloud).

$ dstack push zebra-1

NOTE:

When you run a workflow remotely, its artifacts are pushed automatically, and it is much faster compared to a pushing a local run.

Therefore, if your goal is to reuse the mnist-data artifacts remotely, it is more convenient to run the mnist-data workflow remotely in the first place.

7. Train a model remotely

NOTE:

Before running the mnist-train workflow remotely, we have to ensure that the mnist-data artifacts are available remotely.

Either follow the previous step of pushing the artifacts, or run the mnist-data workflow remotely:

dstack run mnist-data --remote

Now we can run the train-mnist workflow remotely (e.g. in the configured cloud):

$ dstack run train-mnist --remote

When you run a workflow remotely, dstack automatically creates the necessary infrastructure within the configured cloud account, runs the workflow, and stores the artifacts and destroys the infrastructure upon completion.

NOTE:

You can specify hardware resource requirements (like GPU, memory, interruptible instances, etc.) for each remote workflow using resources.

And that's a wrap! If you need to refer to it, the source code for this tutorial can be found in our GitHub repo.