Skip to content

A booster πŸ’ͺ for your Parquet file sizes.

License

Notifications You must be signed in to change notification settings

utndatasystems/virtual

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

virtual

A booster πŸ’ͺ for your Parquet file sizes.

πŸ›  Build

pip3 install virtual-parquet

or

pip3 install .

πŸ”— Examples

A demo can be found at examples/demo.ipynb.

πŸ—œοΈ Compress

import pandas as pd
import virtual

df = pd.read_csv('file.csv')

...

virtual.to_parquet(df, 'file_virtual.parquet')

% Virtualization finished: Check out 'file.parquet'.

πŸ₯’ Read

import virtual

df = virtual.from_parquet('file_virtual.parquet')

πŸ“Š Query

import virtual

virtual.query(
  'select avg(price) from read_parquet("file_virtual.parquet") where year >= 2024',
  engine = 'duckdb'
)

Additional Features

πŸ” Discover the Functions Found

import pandas as pd
import virtual

df = pd.read_csv('file.csv')

functions = virtual.train(df)

% Functions saved under functions.json.

πŸ“š Citation

Please do cite our (very) cool work if you use virtual in your work.

@inproceedings{
  virtual,
  title={{Lightweight Correlation-Aware Table Compression}},
  author={Mihail Stoian and Alexander van Renen and Jan Kobiolka and Ping-Lin Kuo and Josif Grabocka and Andreas Kipf},
  booktitle={NeurIPS 2024 Third Table Representation Learning Workshop},
  year={2024},
  url={https://openreview.net/forum?id=z7eIn3aShi}
}

Releases

No releases published

Packages

No packages published

Languages