an anywidget for data that talks like a duck
quak is a scalable data profiler for quickly scanning large tables.
- interactive 🖱️ mouse over column summaries, cross-filter, sort, and slice rows.
- fast ⚡ built with Mosaic; views are expressed as SQL queries lazily executed by DuckDB.
- flexible 🔄 supports many data types and formats via Apache Arrow and the dataframe interchange protocol.
- reproducible 📓 a UI for building complex SQL queries; materialize views in the kernel for further analysis.
Warning
quak is a prototype exploring a high-performance data profiler based on anywidget. It is not production-ready. Expect bugs. Open-sourced for SciPy 2024.
pip install quak
The easiest way to get started with quak is using the IPython cell magic.
%load_ext quak
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/vega/vega-datasets/main/data/airports.csv")
df
Any cell that returns an object implementing the
Python dataframe interchange protocol
(i.e., a dataframe-like "thing") will be rendered using quak.Widget
, rather
than the default renderer.
Alternatively, you can use quak.Widget
directly:
import polars as pl
import quak
df = pl.read_csv("https://raw.githubusercontent.com/vega/vega-datasets/main/data/airports.csv")
quak.Widget(df)
Contributors welcome! Check the Contributors Guide to get started. Note: I'm wrapping up my PhD, so I might be slow to respond. Please open an issue before contributing a new feature.