Artifacts¶
NOTE:
The source code for the examples below can be found on GitHub.
Define artifacts¶
Create the following workflow YAML file:
workflows:
- name: hello-txt
provider: bash
commands:
- echo "Hello world" > output/hello.txt
artifacts:
- path: ./output
Run it locally using dstack run
:
$ dstack run hello-txt
NOTE:
Artifacts are saved at the end of the workflow.
They are not saved if the workflow was aborted (e.g. via dstack stop -x
).
List artifacts¶
To see artifacts of a run, you can use the
dstack ls
command followed
by the name of the run.
$ dstack ls grumpy-zebra-1
PATH FILE SIZE
data MNIST/raw/t10k-images-idx3-ubyte 7.5MiB
MNIST/raw/t10k-images-idx3-ubyte.gz 1.6MiB
MNIST/raw/t10k-labels-idx1-ubyte 9.8KiB
MNIST/raw/t10k-labels-idx1-ubyte.gz 4.4KiB
MNIST/raw/train-images-idx3-ubyte 44.9MiB
MNIST/raw/train-images-idx3-ubyte.gz 9.5MiB
MNIST/raw/train-labels-idx1-ubyte 58.6KiB
MNIST/raw/train-labels-idx1-ubyte.gz 28.2KiB
Push artifacts¶
When you run a workflow locally, artifacts are stored in ~/.dstack/artifacts
and can be reused only from the workflows
that run locally too.
If you'd like to reuse the artifacts outside your machine, you must push these artifacts using the
dstack push
command:
$ dstack push grumpy-zebra-1
NOTE:
If you run a workflow remotely, artifacts are pushed automatically, and it's typically a lot faster than pushing artifacts of a local run.
Pull artifacts¶
When running a workflow remotely, such as when using --remote
with the dstack run
command, the resulting artifacts are
stored remotely.
If you wish to access these artifacts locally, you can use the dstack pull
command.
$ dstack pull grumpy-zebra-1
This command downloads the artifacts to ~/.dstack/artifacts
and enables their reuse in your other local workflows.
Add tags¶
If you wish to reuse the artifacts of a specific run, you can assign a tag (via the dstack tags
command)
to it and use the tag to reference the artifacts.
Here's how to add a tag to a run:
$ dstack tags add grumpy-zebra-1 awesome-tag
Even if you delete the grumpy-zebra-1
run, you can still access its artifacts using the awesome-tag
tag name.
You can reference a tag through the dstack push
,
dstack pull
, dstack ls
, and through Deps."
Real-time artifacts¶
If you run your workflow remotely, and want to save artifacts in real time (as you write files to the disk),
you can set the mount
property to true
for a particular artifact.
Let's create the following bash script:
for i in {000..100}
do
sleep 1
echo $i > "output/${i}.txt"
echo "Wrote output/${i}.txt"
done
Now, create the following workflow YAML file:
workflows:
- name: hello-sh
provider: bash
commands:
- bash artifacts/hello.sh
artifacts:
- path: ./output
mount: true
Go ahead and run this workflow remotely:
$ dstack run hello-sh --remote
NOTE:
Every read or write operation within the mounted artifact directory will create an HTTP request to the storage.
The mount
option can be used to save and restore checkpoint files
if the workflow uses interruptible instances.