dstack applications often work with large datasets and ML models, unnecessary heavy computations may affect the performance of applications. In order to avoid it,
dstack offers its built-in caching system.
Here's a very simple example of how it works:
import dstack as email@example.com()def get_data():return pd.read_csv("https://www.dropbox.com/s/cat8vm6lchlu5tp/data.csv?dl=1", index_col=0)
Now, if the function
get_data() is called from an application,
dstack will call it only once and then will re-use its value if it's called again.
dstack.cache() can be used on any function regardless of how many arguments the function has:
import dstack as firstname.lastname@example.org()def get_data(url):return pd.read_csv(url, index_col=0)
In the example above, for every unique value of
dstack will re-use the correct cached version.
dstack offers a way to customize the behavior of
dstack.cache() by specifying a function that provides an object on which the hash of the function arguments is calculated.
Imagine, you'd like to change the example above to invalidate the cache each day:
import dstack as dsfrom datetime import datetimedef my_hash_func(url):return (url, datetime.today().date())@ds.cache(hash_func=my_hash_func)def get_data(url):return pd.read_csv(url, index_col=0)
Here', we define the function
my_hash_func(), which takes the same arguments as
get_data(). This function returns a tuple with the
url and today's date. Now, if the function get_data is called,
dstack invokes the function
my_hash_func() on the given arguments and then calculates a hash on it. If there's a cached value associated with the calculated cache,
dstack returns it and make an unnecessary call of