Skip to content

Artifacts

NOTE:

The source code for the examples below can be found on GitHub.

Define artifacts

Create the following workflow YAML file:

workflows:
  - name: hello-txt
    provider: bash
    commands:
      - echo "Hello world" > output/hello.txt
    artifacts:
      - path: ./output

Run it locally using dstack run:

$ dstack run hello-txt

NOTE:

Artifacts are saved at the end of the workflow. They are not saved if the workflow was aborted (e.g. via dstack stop -x).

List artifacts

To see artifacts of a run, you can use the dstack ls command followed by the name of the run.

$ dstack ls grumpy-zebra-1

PATH  FILE                                  SIZE
data  MNIST/raw/t10k-images-idx3-ubyte      7.5MiB
      MNIST/raw/t10k-images-idx3-ubyte.gz   1.6MiB
      MNIST/raw/t10k-labels-idx1-ubyte      9.8KiB
      MNIST/raw/t10k-labels-idx1-ubyte.gz   4.4KiB
      MNIST/raw/train-images-idx3-ubyte     44.9MiB
      MNIST/raw/train-images-idx3-ubyte.gz  9.5MiB
      MNIST/raw/train-labels-idx1-ubyte     58.6KiB
      MNIST/raw/train-labels-idx1-ubyte.gz  28.2KiB    

Push artifacts

When you run a workflow locally, artifacts are stored in ~/.dstack/artifacts and can be reused only from the workflows that run locally too.

If you'd like to reuse the artifacts outside your machine, you must push these artifacts using the dstack push command:

$ dstack push grumpy-zebra-1

NOTE:

If you run a workflow remotely, artifacts are pushed automatically, and it's typically a lot faster than pushing artifacts of a local run.

Pull artifacts

When running a workflow remotely, such as when using --remote with the dstack run command, the resulting artifacts are stored remotely.

If you wish to access these artifacts locally, you can use the dstack pull command.

$ dstack pull grumpy-zebra-1

This command downloads the artifacts to ~/.dstack/artifacts and enables their reuse in your other local workflows.

Add tags

If you wish to reuse the artifacts of a specific run, you can assign a tag (via the dstack tags command) to it and use the tag to reference the artifacts.

Here's how to add a tag to a run:

$ dstack tags add grumpy-zebra-1 awesome-tag

Even if you delete the grumpy-zebra-1 run, you can still access its artifacts using the awesome-tag tag name.

You can reference a tag through the dstack push, dstack pull, dstack ls, and through Deps."

Real-time artifacts

If you run your workflow remotely, and want to save artifacts in real time (as you write files to the disk), you can set the mount property to true for a particular artifact.

Let's create the following bash script:

for i in {000..100}
do
    sleep 1
    echo $i > "output/${i}.txt"
    echo "Wrote output/${i}.txt"
done

Now, create the following workflow YAML file:

workflows:
  - name: hello-sh
    provider: bash
    commands:
      - bash artifacts/hello.sh
    artifacts:
      - path: ./output
        mount: true

Go ahead and run this workflow remotely:

$ dstack run hello-sh --remote

NOTE:

Every read or write operation within the mounted artifact directory will create an HTTP request to the storage.

The mount option can be used to save and restore checkpoint files if the workflow uses interruptible instances.