Projects / funes

Funes lets you decorate functions in such a way that their computations are memorized permanently to disk. Funes is named after the BorgesWikipedia short story Funes the Memoriesen.wikipedia.org.

Motivation

This is extremely useful for scientific investigations in which partial results should be stored between runs — hyperparameter sweeps being a good example.

  • If you complete a hyperparameter sweep, and then later decide to extend the range or granularity of the sweep, the previous computations will complete instantly, and only the novel hyperparameters will result in any extra computation time.

  • Jobs can be cancelled (or crash) part-way through, and re-running the script will effectively skip the completed parts and resume where you left off.

  • Because the cache keys are definition-sensitive, changing a function definition will cause the hash value used in the key path to change, which will result in all stale results being recomputed on demand — there is no risk you will ever get a wrong result!

  • Lastly, because results are simply stored on disk, using a transparent serialization format, it is easy to read, interpret, filter, modify, save, backup, and move cached values around.

Details

This package allows the results of expensive functions be cached to disk, by decorating these functions with the @memorious decorator. When that function is called again on an identical input, the cached result will be loaded from disk and returned.

This is achieved by pickling the result and input and writing it to disk, using a unique path that identifies the particular version of the function that is being cached (based on a hash that depends on its definition, thanks to the dill package), as well as a key that depends on the hashed arguments of the function.

Be warned that Memorious functions should not contain (possibly mutually) recursive definitions, or the stack will overflow. Their hash values will change if functions that depend on are changed. However, these dependencies are only followed if they remain within the current module.

The pickled results will be cached under cache/funcname/hexhash/<HASH>.dill. These files can be deleted manually to clear the cache, or transported between computers.

Each pickled computation contains a dictionary with the following keys:

  • input: ordered dict of function arguments + their provided (or defaulting) values
  • output: whatever the function produced
  • time: time in seconds the function took to run

A special global_seed keyword argument can be provided to any memorious function to set the global random seed prior to running the function. This allows controlled stochasticity to be used, while still maintaining the benefits of caching.

Installing

You can install funes by running:

pip install funes

Examples

The following script demonstrates the basic idea behind funes:

from funes import memorious, load_cached_results
from time import sleep

@memorious
def double(x):
    print('doubling', x)
    sleep(0.5)
    return x * 2

print("uncached (will be slow)")
for i in range(5):
    double(i)

print("cached (will be fast)")
for i in range(5):
    double(i)

print("uncached (will be slow, unique global seed)")
for seed in range(5):
    double(0, global_seed=seed)

print("cached (will be fast, reuse global seed)")
for seed in range(5):
    double(0, global_seed=seed)

print("all cached values")
print(list(load_cached_results(double)))