Name		Name	Last commit message	Last commit date
parent directory ..
results		results
utils		utils
README.md		README.md
__init__.py		__init__.py
bench_minhash_batch.py		bench_minhash_batch.py
benchmark_fuzzy_join_hash.py		benchmark_fuzzy_join_hash.py
encoders_time_memory_bench.py		encoders_time_memory_bench.py
fuzzy_join_benchmark.py		fuzzy_join_benchmark.py
fuzzy_join_full_benchmark.py		fuzzy_join_full_benchmark.py
get_results.py		get_results.py
similarity_scores_time_benchmark.py		similarity_scores_time_benchmark.py
tablevectorizer_tuning.py		tablevectorizer_tuning.py

README.md

dirty_cat benchmarks

Objectives

This folder contains benchmarks used by the dirty_cat maintainers to:

Experiment on new algorithms
Validate decisions based on empirical evidence
Fine-tune (hyper)parameters in the library

These benchmarks do not aim at replacing the tests within dirty_cat.

Implementing a benchmark

A mini-framework consisting of a few functions is made available under utils.

Check out other benchmarks to see how they are used.

Launching a benchmark

Launching a benchmark is usually something you don't want to do as a user. Benchmarks are long and expensive to run, and are here mainly for reproducibility.

Each one implements a standard command-line interface with the two commands --run and --plot.

For instance, for running a benchmark, we'll usually use

python tablevectorizer_tuning.py --run

Package requirements required to launch benchmarks are listed in the project's setup, thus can be installed with

pip install -e .[benchmarks]

It has been reported that Python >=3.9 is required.

Analyzing results

The results of the benchmarks ran by maintainers are pushed in the results/ folder.

As mentioned earlier, benchmarks implement a --plot parameter used to display the results visually. Using --plot without --run allows you to plot the results without re-running the benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks

benchmarks

README.md

dirty_cat benchmarks

Objectives

Implementing a benchmark

Launching a benchmark

Analyzing results

Files

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

dirty_cat benchmarks

Objectives

Implementing a benchmark

Launching a benchmark

Analyzing results