- The Woodlands, Texas
- https://szilard.github.io/aboutme/
Stars
Adaptive and automatic gradient boosting computations.
Most recent/important talks given at conferences/meetups
A curated list of gradient boosting machines (GBM) resources
Advanced workshop on XGBoost with Tianqi Chen in Santa Monica, June 2, 2016
Szilard Pafka's short bio (to go with conference talk abstracts)
Kaggle scripts: R vs pydata + most popular R and Python packages for Machine Learning
Tuning GBMs (hyperparameter tuning) and impact on out-of-sample predictions
Code (and other materials) for an introductory talk/workshop on GBMs (developed originally for an R-Ladies Meetup)
Machine Learning #1 and #2 courses at CEU Master of Science in Business Analytics
Machine Learning #1 and #2 courses at CEU Master of Science in Business Analytics
Some thoughts on how to use machine learning in production
Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA
Compare the scoring speed of several open source machine learning libraries.
Performance of various open source GBM implementations
GBM multicore scaling: h2o, xgboost and lightgbm on multicore and multi-socket systems
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html
Inspired by David Donoho's "50 Years of Data Science" (2015) paper, I'm releasing here a course proposal draft I wrote in 2009 for a possible course of "data science".
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning al…
Materials for a short introductory/intermediate Data Science course taught in the MSc in Business Analytics program at the Central European University
A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).
Size of datasets used for analytics based on 10 years of surveys by KDnuggets.
Latency numbers every data scientist should know (aka the pyramid of analytical tasks) - the order of magnitude of computational time for the most common analytical tasks (SQL-like data munging, li…
Quick informal survey at the Los Angeles Machine learning meetup about tools used for machine learning.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow