Highlights
Stars
Sources for the book "Machine Learning in Production"
Python based GBDT implementation on GPU. Efficient multioutput (multiclass/multilabel/multitask) training
🐵 Preswald is a full-stack platform for building, deploying, and managing interactive data applications. It brings ingestion, storage, transformation, and visualization into a simple SDK, minimizin…
A collection of research papers on decision, classification and regression trees with implementations.
a python interface to OC1 and other oblique decision tree implementations
Scikit-learn compatible decision trees beyond those offered in scikit-learn
An R package for modern methods for non-probability samples
A R library of pseudo-random number generators written in C++
A distributed agent orchestration framework for market agents
Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
Materials for the the Analyzing Time Series at Scale with Cluster Analysis in R Workshop
Python interactive dashboards for learning data science
Precinct shapes (and vote results) for US elections past, present, and future
Free MLOps course from DataTalks.Club
Hacking & Cybersecurity class materials - Scott J. Shapiro & Sean O'Brien
Demo Project for Open Source MDS
Datasets derived from US census data
Polars extension for general data science use cases
A curated list of Polars talks, tools, examples & articles. Contributions welcome !
Community developed Quarto extension to enable interactive Python code cells in HTML documents using Pyodide
NHANES-GCP: Leveraging the Google Cloud Platform and BigQuery ML for reproducible machine learning with data from the National Health and Nutrition Examination Survey
Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)