Skip to content

Custom providers

A provider is a program that defines how a workflow materializes into actual jobs that process and output data according to the workflow parameters.

While dstack offers the built-in providers, the users of dstack can define and use custom providers. Read on to learn how to build custom providers.

Define providers

Providers must be defined in the .dstack/providers.yaml file.

Syntax

The root element of the .dstack/providers.yaml file is always providers.

It's an array, where each item represents a Provider and may have the following parameters:

Name Required Description
name The name of our workflow
image The name of the Docker image
commands The list of the commands that start the Provider

Here's an example:

providers:
    - name: curl
      image: python:3.9
      commands:
        - pip3 install -r providers/curl/requirements.txt
        - PYTHONPATH=providers python3 providers/curl/main.py

Build providers

dstack offers a Python API to build custom providers.

Here's an example:

from typing import List

from dstack import Provider, Job


class CurlProvider(Provider):
    def __init__(self):
        super().__init__(schema="providers/curl/schema.yaml")
        self.url = self.workflow.data["url"]
        self.output = self.workflow.data["output"]
        self.artifacts = self.workflow.data["artifacts"]

    def create_jobs(self) -> List[Job]:
        return [Job(
            image="python:3.9",
            commands=[
                f"curl {self.url} -o {self.output}"
            ],
            artifacts=self.artifacts
        )]


if __name__ == '__main__':
    provider = CurlProvider()
    provider.start()

Define a schema YAML file (optional)

A provider may have any number of parameters, which users will have to fill in their .dstack/workflows.yaml file when using the provider. For example, the curl provider from above has three parameters: url, output, and artifacts.

If you want the provider to validate whether the parameters are filled correctly, you can define a provider schema, e.g. the following way:

type: object
additionalProperties: false
properties:
  url:
    type: string
  output:
    type: string
  artifacts:
    type: array
    items:
      type: string
required:
  - url
  - output

Define a provider class

The next step is defining a Python class of your provider that must inherit from the dstack.Provider class.

Define the __init__ function that initializes the provider and reads its parameters:

a. Call the function from the super class. If you defined a schema in the previous step, pass its path to the schema argument.

b. Read the parameters of your provider from the self.workflow.data dictionary.

def __init__(self):
    super().__init__(schema="providers/curl/schema.yaml")
    self.url = self.workflow.data["url"]
    self.output = self.workflow.data["output"]
    self.artifacts = self.workflow.data["artifacts"]

Implement the create_jobs function that creates the actual jobs. Use the dstack.Job class to create instances of jobs.

def create_jobs(self) -> List[Job]:
    return [Job(
        image="python:3.9",
        commands=[
            f"curl {self.url} -o {self.output}"
        ],
        artifacts=self.artifacts
    )]

The dstack.Job class has the following arguments:

Name Type Required Description
image str The name of the Docker image of the Job
commands List[str] The list of the commands that start the container of the Job
ports List[int] The list of ports that the container of the Job should open
working_dir str The working directory of the Job container
artifacts List[str] The list of folders inside the Job container
that has to be stored as output artifacts of the Job
ports List[int] The list of ports exposed by the Job container
resources dstack.ResourceRequirements The resource required by the Job, incl. CPU, memory, and GPU
depends_on List[dstack.Job] The list of other Jobs the Job is pending on
master dstack.Job The master job (in case the current job must
communicate with the master job)

Test providers

In order to test your provider, simply define a workflow that uses your provider in the same project repository.

Here's an example:

workflows:
  - name: tinyshakespeare
    provider: curl
    url: https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt
    output: data/input.txt
    artifacts:
      - data

And then, run it:

dstack run tinyshakespeare

Once your run is assigned to a runner and starts running, you'll see the output of your provider in the logs of your run.

Use providers

If you want to use a provider from another repository, use the following syntax.

Here's an example:

workflows:
  - name: tinyshakespeare
    provider:
      repo: https://github.com/<github user>/<github repository>
      name: curl
    url: https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt
    output: data/input.txt
    artifacts:
      - data
Back to top