Table of Contents
- Intro to Data Science
- Web Scraping
- Theano
- Data Visualization
- Semi-supervised Learning
- Dealing with Temporal Clinical Data
- RNNs and Hyperparameters
- Bayesian Methods
- Distributed Learning
- Techniques for Dimensionality Reduction
- Modeling Sensor Data
- Introduction to Markov Decision Processes
- Perception as Analysis by Synthesis
- Operationalizing Data Science Output
- GPU Accelerated Learning
- High Dimensional Function Learning
- Basketball Analytics Using Player Tracking Data
- TensorFlow in Practice
- Virtual Currency Trading
- NSFW Modeling with ConvNets
- Structured Attention Networks
- Automated Machine Learning
- Grounding Natural Language with Autonomous Interaction
- Imran Malek is a Solutions Architect at DataXu. His workshop introduced pandas and matplotlib. Slides | Notebook
- Marcus Way is an SDE at Amazon and was previously a Software Engineer at Wanderu, a company that helps people find the lowest bus fares. This workshop took us through the process of acquiring data from the web before building a model to predict whether an article's title originated from Gawker or the Wall Street Journal. Notebook
- Alec Radford is the Head of Research at indico. His talk introduced Theano and convolutional networks. Video | Code
- Lane Harrison is an Assistant Professor of Computer Science at WPI and was previously a Postdoc in the Visual Analytics Lab at Tufts. His workshop introduced data visualization with d3.js. Slides | Code
- Eli Brown is an Assistant Professor of Computer Science at DePaul. His talk focused on using interactive visualizations to help users leverage learning algorithms. Slides | Paper
- Marzyeh Ghassemi is a PhD Student at MIT CSAIL in the Clinical Decision Making Group. Her session introduced both Latent Dirchlet Allocation and Gaussian Processes before walking us through her recent paper entitled "A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data." Paper | Slides
-
Alec Radford - Using Passage to Train RNNs. Slides | Code | Video
-
David Duvenaud - Gradient-Based Learning of Hyperparameters. Paper | Slides | Code
-
Allen Downey is a Professor of Computer Science at Olin College. His talk focused on an application of bayesian statistics from World War II. Slides | Video
-
José Miguel Hernández Lobato is a Postdoc at Harvard's Intelligent Probabilistic Systems Lab and presented on bayesian optimization and information-based approaches. Slides
- Arno Candel is the Chief Architect at H2o. His talk focused on the implementation and application of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting, and Deep Neural Networks. Slides
- Dan Steinburg is a PhD student in intelligent systems at the University of Pittsburgh. His talked introduced various techniques for dimensionality reduction including PCA, multidimensional scaling, isomaps, locally linear embedding, and laplacian eigenmaps. Slides
- Hank Roark is a Data Scientist at H2O, where he works on building data products within the domains of machine prognostics, health management, and agriculture. His workshop focused on on the challenges faced when modeling streaming sensor data. Slides | Notebook
- Alborz Geramifard is a Research Scientist at Amazon and lead an introductory workshop on MDPs with RLPy. Paper | Code | Slides
- Tejas Kulkarni is a PhD Student at MIT in Josh Tenenbaum's lab and spent last summer working at Google DeepMind in London. His talk will was focused on his recent paper entitled: "Picture: A Probabilistic Programming Language for Scene Perception." Paper | Slides
- Tom LaGatta is a Senior Data Scientist & Analytics Architect at Splunk. His session focused on aligning data science output with operational workflows. Slides
- Bob Crovella joined NVIDIA in 1998 and leads a technical team that is responsible for supporting GPU Computing Products. His talk began with an introduction to why GPUs are helpful when training deep neural networks. He then walked through demos of cuDNN and DIGITS from the perspective of how they fit together with frameworks like Caffe, Torch, and Theano. Slides | Video
- Jason Klusowski is a PhD student at Yale and presented on the computational and theoretical aspects of approximating d-dimensional functions. Slides | Video
- Alexander D'Amour is an Assistant Professor in Statistics at UCB, and recently completed his PhD at Harvard. His talk introduced applications of 24-FPS spatial data in the direction of answering fundamental questions related to the game of basketball. Slides | Video
- Nathan Lintz is a research scientist at indico Data Solutions where he is responsible for developing machine learning systems in the domains of language detection, text summarization, and emotion recognition. His session focused on the first principles of TensorFlow, building all the way up to generative modeling with recurrent networks. Slides | Code | Video
- Anders Brownworth is a principle engineer at Circle and was previously an instructor at the MIT Media Lab. His talk focused on building the intution needed with respect to the blockchain and bitcoin to develop succesful trading stratagies. Slides | Video | Hacker News
- Ryan Compton is a data scientist at Clarifai. His talk used the problem of nudity detection to illustrate the workflow involved with training and evaluating convolutional neural networks. He also discussed deconvolution and demonstrated how it can be used to visualize intermediate feature layers. Slides | Video
- Yoon Kim is Phd Student in computer science at Harvard. This session gave an overview of attention mechanisms and structured prediction before introducing a method for combining the two ideas by way of graphical models. Slides | Code
- Nicolo Fusi is a research scientist at Microsoft Research, working at the intersection of machine learning, computational biology and medicine. He received his PhD in Computer Science from the University of Sheffield under Neil Lawrence. His talk focused on the process of selecting and tuning pipelines consisting of data preprocessing methods and machine learning models. Slides | Paper
- Karthik Narasimhan is a PhD candidate at CSAIL working on natural language understanding and deep reinforcement learning. His talk focused on task-optimized representations to reduce dependence on annotation. The session built up to a demonstration of how reinforcement learning can enhance traditional NLP systems in low resource scenarios. In particular, he described an autonomous agent that can learn to acquire and integrate external information to improve information extraction. Slides