This is extremely useful for scientific investigations in which partial results should be stored between runs — hyperparameter sweeps being a good example.
If you complete a hyperparameter sweep, and then later decide to extend the range or granularity of the sweep, the previous computations will complete instantly, and only the novel hyperparameters will result in any extra computation time.
Jobs can be cancelled (or crash) part-way through, and re-running the script will effectively skip the completed parts and resume where you left off.
Because the cache keys are definition-sensitive, changing a function definition will cause the hash value used in the key path to change, which will result in all stale results being recomputed on demand — there is no risk you will ever get a wrong result!
Lastly, because results are simply stored on disk, using a transparent serialization format, it is easy to read, interpret, filter, modify, save, backup, and move cached values around.
This package allows the results of expensive functions be cached to disk, by decorating these functions with the
@memorious decorator. When that function is called again on an identical input, the cached result will be loaded from disk and returned.
This is achieved by pickling the result and input and writing it to disk, using a unique path that identifies the particular version of the function that is being cached (based on a hash that depends on its definition, thanks to the
dill package), as well as a key that depends on the hashed arguments of the function.
Be warned that Memorious functions should not contain (possibly mutually) recursive definitions, or the stack will overflow. Their hash values will change if functions that depend on are changed. However, these dependencies are only followed if they remain within the current module.
The pickled results will be cached under
cache/funcname/hexhash/<HASH>.dill. These files can be deleted manually to clear the cache, or transported between computers.
Each pickled computation contains a dictionary with the following keys:
input: ordered dict of function arguments + their provided (or defaulting) values
output: whatever the function produced
time: time in seconds the function took to run
global_seed keyword argument can be provided to any memorious function to set the global random seed prior to running the function. This allows controlled stochasticity to be used, while still maintaining the benefits of caching.
You can install
funes by running:
pip install funes
The following script demonstrates the basic idea behind funes:
from funes import memorious, load_cached_results from time import sleep @memorious def double(x): print('doubling', x) sleep(0.5) return x * 2 print("uncached (will be slow)") for i in range(5): double(i) print("cached (will be fast)") for i in range(5): double(i) print("uncached (will be slow, unique global seed)") for seed in range(5): double(0, global_seed=seed) print("cached (will be fast, reuse global seed)") for seed in range(5): double(0, global_seed=seed) print("all cached values") print(list(load_cached_results(double)))