A repository of papers discussed at nPlan's Machine Learning Paper Club.
Paper Club is now remote, with an in-person session approximately every 4 weeks. As always, Thursdays 12h30 London, but via webinar or in our office in Whitechapel. During the session feel free to ask and answer questions or make a comment about the paper. This is a discussion rather than a presentation. Bear in mind that these meetings may be recorded for dissemination purposes.
- [02/13/2025] Arvid presents: Playing Atari with Deep Reinforcement Learning by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
- [02/20/2025] Peter presents: MCMC using Hamiltonian dynamics focused on Langevin Dynamics applied to Markov Chain Monte Carlo by Radford M. Neal
- [03/13/2025] Inneke presents: TBD
- [03/27/2025] Sophie presents: TBD
- [04/10/2025] Naomi presents: TBD
IF YOU WOULD LIKE TO PRESENT AND WOULD LIKE SOME GUIDANCE HERE IS A TEMPLATE ON HOW TO STRUCTURE YOUR PRESENTATION
FOR IN-PERSON SESSIONS: IF YOU ARE ATTENDING IN PERSON PLEASE RSVP ON OUR MEETUP PAGE SO WE CAN GET A HEADCOUNT FOR FOOD AND DRINKS. THE ADDRESS FOR THE EVENT WILL BE ON THE MEETUP PAGE.
For those new to machine learning, these are some recommended reading material:
-
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
-
W. Hamilton. 2020, Graph Representation Learning
-
L. Wu, P. Cui, J. Pei, L. Zhao, L. Song, 2022, Graph Neural Networks
-
Provost, F. and Fawcett, T. (2013). Data science for business. Sebastopol: O'Reilly.
-
Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420.
Transformer-related resources:
-
Examples of BERT: sentiment analysis and feature extraction from BERT
The wide and deep model implementation that Carlos presented can be found here https://github.com/caledezma/wide_deep_model. Why not download it, play with it, and let us know your findings at paper club?
The demo for Platt scaling in calibration can be found here https://github.com/caledezma/calibration_scaling_demo. Feel free to contribute to it, we might make a push to TensorFlow with a Platt Scaling layer!
We regularly record the presentations made during the Meetup (subject to the presenter's approval). These videos are then uploaded to our YouTube channel so that those that can't attend are still able to profit from the presentations. If you'd like to stay up to date with the presentations, just hit the subscribe button!
Past papers discussed in Paper Club meetings:
-
[06/02/2025] Gerard presents: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning by DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z.F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J.L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R.J. Chen, R.L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S.S. Li , Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W.L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X.Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y.K. Li, Y.Q. Wang, Y.X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y.X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z.Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen Zhang
-
[30/01/2025] Peter presents: Fisher Information as it relates to the paper Bayesian Learning via Stochastic Gradient Langevin Dynamics RECORDING
-
[23/01/2025] Damian presents Training Large Language Models to Reason in a Continuous Latent Space by Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian RECORDING
-
[16/01/2025] Peter presents A Stochastic Approximation Method by Herbert Robbins and Sutton Monro RECORDING
-
[09/01/2025] Peter presents Bayesian Learning via Stochastic Gradient Langevin Dynamics by Max Welling, Yee Whye Teh RECORDING
-
[02/01/2025] Ben presents: CryoDRGN: Reconstruction of heterogeneous cryo-EM structures using neural networks by Ellen D. Zhong, Tristan Bepler, Bonnie Berger, Joseph H. Davis RECORDING
-
[12/18/2024] Peter presents: Towards Understanding Evolving Patterns in Sequential Data (ONE OF NEURIPS 2024 SPOTLIGHT PAPERS) by Qiuhao Zeng, Long-Kai Huang, Qi CHEN, Charles Ling, Boyu Wang RECORDING
-
[12/11/2024] Ben presents: Flow Matching for Generative Modeling by Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le RECORDING
-
[05/12/2024] Peter presents: Binarized Neural Networks by Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio RECORDING
-
[28/11/2024] Inneke presents: Open-Endedness is Essential for Artificial Superhuman Intelligence by Edward Hughes, Michael Dennis, Jack Parker-Holder, Feryal Behbahani, Aditi Mavalankar, Yuge Shi, Tom Schaul, Tim Rocktaschel RECORDING
-
[21/11/2024] Gerard presents: The Surprising Effectiveness of Test-Time Training for Abstract Reasoning by Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas RECORDING
-
[14/11/2024] Peter presents: Practical Markov Chain Monte Carlo Part 2 by Charles J. Geyer RECORDING
-
[07/11/2024] Peter presents: Practical Markov Chain Monte Carlo Part 1 by Charles J. Geyer RECORDING
-
[31/10/2024] Arvid presents: Multi-Agent Learning using a Variable Learning Rate by Michael Bowling and Manuela Veloso RECORDING
-
[24/10/2024] IN-PERSON PAPER CLUB, PLEASE BRING A SOMETHING TO WRITE ON AND SOMETHING TO WRITE WITH Peter presents: A Practical Guide to the
einsum
function and Einstein summation notation. WHY SHOULD YOU BE INTERESTED? Theeinsum
is a function that can save a lot of memory and compute when performing the linear tensor operations necesssary for modern neural networks. It is becoming more and more prevalent in machine learning papers and could one day be widely adopted. RECORDING -
[17/10/2024] Gerard presents: Were RNNs All We Needed? by Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadeghi RECORDING
-
[10/10/2024] Damian presents: ReFT: Representation Finetuning for Language Models by Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts RECORDING
-
[03/10/2024] Ben presents: Self-Consistency Improves Chain of Thought Reasoning in Language Models by Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou RECORDING
-
[26/09/2024] Sophie presents: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs by Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, Bryan Catanzaro RECORDING
-
[19/09/2024] Inneke presents: LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs by Chansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang RECORDING
-
[12/09/2024] Peter presents: Long-Form Factuality in Large Language Models by Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le Recording
-
[05/09/2024] Vahan presents: Small Molecule Optimization with Large Language Models by Philipp Guevorguian, Menua Bedrosian, Tigran Fahradyan, Gayane Chilingaryan, Hrant Khachatrian, Armen Aghajanyan RECORDING
-
[29/08/2024] Tanya presents Constitutional AI: Harmlessness from AI Feedback by Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Jared Kaplan RECORDING
-
[22/08/2024] Gerard presents ColPali: Efficient Document Retrieval with Vision Language Models by Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo RECORDING
-
[15/08/2024] Tanya presents Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo by Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse RECORDING
-
[08/08/2024] Peter presents Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization by Ian Gemp, Luke Marris, Georgios Piliouras Recording
-
[01/08/2024] Sophie presents From Local to Global: A Graph RAG Approach to Query-Focused Summarization by Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Jonathan Larson Recording
-
[25/07/2024] Peter presents Position: Levels of AGI for Operationalizing Progress on the Path to AGI by Meredith Ringel Morris, Jascha Sohl-Dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg Recording
-
[18/07/2024] Tanya presents Detecting hallucinations in large language models using semantic entropy by Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn & Yarin Gal Recording
-
[11/07/2024] Peter presents Robust agents learn causal world models by Jonathan Richens, Tom Everitt Recording
-
[04/07/2024] Tanya presents: Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality by Tri Dao, Albert Gu
-
[20/06/2024] Dwane presents: What's the Magic Word? A Control Theory of LLM Prompting by Aman Bhargava, Cameron Witkowski, Manav Shah, Matt Thomson
-
[06/06/2024] Gerard presents: Is Cosine-Similarity of Embeddings Really About Similarity? by Harald Steck, Chaitanya Ekanadham, Nathan Kallus
-
[13/06/2024] Peter presents: Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian Approximation via Variance Reduction by Yichen Wu, Long-Kai Huang, Renzhen Wang, Deyu Meng, Ying Wei
-
[30/05/2024] Damian presents: KAN: Kolmogorov-Arnold Networks by Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, Max Tegmark RECORDING
-
[23/05/2024] Vahan presents: PROVING TEST SET CONTAMINATION IN BLACK BOX LANGUAGE MODELS by Yonatan Oren, Nicole Meister, Niladri Chatterji, Faisal Ladhak, Tatsunori B. Hashimoto
-
[09/05/2024] Peter presents: The Curse of Recursion: Training on Generated Data Makes Models Forget by Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson
-
[02/05/2024] Arvid presents: TacticAI: an AI assistant for football tactics by Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis & Karl Tuyls RECORDING
-
[25/04/2024] Vahan presents: Reverse Engineering Self-Supervised Learning by Ido Ben-Shaul, Ravid Shwartz-Ziv, Tomer Galanti, Shai Dekel, Yann LeCun
-
[11/04/2024] Ben presents: Zero Shot Molecular Generation via Similarity Kernels by Rokas Elijošius, Fabian Zills, Ilyes Batatia, Sam Walton Norwood, Dávid Péter Kovács, Christian Holm, Gábor Csányi RECORDING
-
[04/04/2024] Peter presents: Continual Learning in the Presence of Spurious Correlation by Donggyu Lee, Sangwon Jung, Taesup Moon RECORDING
-
[28/03/2024] Gerard presents: RAFT: Adapting Language Model to Domain Specific RAG by Tianjun Zhang, Shishir G. Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez
-
[21/03/2024] Vahan presents: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking by Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman RECORDING
-
[14/03/2024] IN-PERSON, AFTERWORK, NOT AT OUR OFFICE PAPER CLUB. Please see MEETUP PAGE for details. Peter presents: Alignment of Large Language Models (No pre-reading required)
-
[07/03/2024] Vahan presents: World Model on Million-Length Video And Language With RingAttention by Hao Liu, Wilson Yan, Matei Zaharia and Pieter Abbeel RECORDING
-
[29/02/2024] Gerard presents: KTO: Model Alignment as Prospect Theoretic Optimization by Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela RECORDING
-
[22/02/2024] Ben presents Tweedie Moment Projected Diffusions For Inverse Problems by Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, O. Deniz Akyildiz
-
[15/02/2024] Peter presents ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation by Sungduk Yu · Walter Hannah · Liran Peng · Jerry Lin · Mohamed Aziz Bhouri · Ritwik Gupta · Björn Lütjens · Justus C. Will · Gunnar Behrens · Julius Busecke · Nora Loose · Charles Stern · Tom Beucler · Bryce Harrop · Benjamin Hillman · Andrea Jenney · Savannah L. Ferretti · Nana Liu · Animashree Anandkumar · Noah Brenowitz · Veronika Eyring · Nicholas Geneva · Pierre Gentine · Stephan Mandt · Jaideep Pathak · Akshay Subramaniam · Carl Vondrick · Rose Yu · Laure Zanna · Tian Zheng · Ryan Abernathey · Fiaz Ahmed · David Bader · Pierre Baldi · Elizabeth Barnes · Christopher Bretherton · Peter Caldwell · Wayne Chuang · Yilun Han · YU HUANG · Fernando Iglesias-Suarez · Sanket Jantre · Karthik Kashinath · Marat Khairoutdinov · Thorsten Kurth · Nicholas Lutsko · Po-Lun Ma · Griffin Mooers · J. David Neelin · David Randall · Sara Shamekh · Mark Taylor · Nathan Urban · Janni Yuval · Guang Zhang · Mike Pritchard RECORDING
-
[08/02/2024] Vahan presents: Are Emergent Abilities of Large Language Models a Mirage? by Rylan Schaeffer, Brando Miranda, Sanmi Koyejo RECORDING
-
[01/02/2024] Peter presents: Solving olympiad geometry without human demonstrations by Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He & Thang Luong RECORDING
-
[25/01/2024] Gerard presents: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training By Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez RECORDING
-
[18/01/2024] Ben presents: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding by Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi RECORDING RECORDING
-
[11/01/2024] Damian presents: Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn
-
[04/01/2024] Vahan presents: Mamba: Linear-Time Sequence Modeling with Selective State Spaces Networks by Albert Gu, Tri Dao
-
[21/12/2023] Vahan presents: Cooperative Graph Neural Networks by Ben Finkelshtein, Xingyue Huang, Michael Bronstein, İsmail İlkan Ceylan
-
[14/12/2023] Peter presents: Scaling deep learning for materials discovery by Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon & Ekin Dogus Cubuk
-
[30/11/2023] Gerard presents: Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability by Arush Tagade, Jessica Rumbelow
-
[23/11/2023] Paper Club Social - Peter presents: Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements by Peter A. Zachares, Vahan Hovhannisyan, Alan Mosca, Yarin Gal and Max presents: Human Feedback is not Gold Standard by Tom Hosking, Phil Blunsom, Max Bartolo RECORDING
-
[09/11/2023] Vahan presents: YaRN: Efficient Context Window Extension of Large Language Models by Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole RECORDING
-
[02/11/2023] Peter presents: Efficient Streaming Language Models with Attention Sinks by Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis RECORDING
-
[17.10.2023] Peter presents: Disentanglement with Biological Constraints: A Theory of Functional Cell Types by James C. R. Whittington, Will Dorrell, Surya Ganguli, Timothy Behrens RECORDING
-
[12.10.2023] Vahan presents: Quantum machine learning by Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe and Seth Lloyd RECORDING
-
[05.10.2023] Gerard presents: ImageBind: One Embedding Space To Bind Them All by Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra RECORDING
-
[28.09.2023] Peter presents: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control by Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, and Brianna Zitkovich
-
[14.09.2023] Inneke presents Graph of Thoughts: Solving Elaborate Problems with Large Language Models by Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L., Gajda, J., Lehmann, T., Podstawski, M., Niewiadomski, H., Nyczyk, P. and Hoefler, T. Recording
-
[07.09.2023] Vahan presents Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning by Zeyuan Allen-Zhu, Yuanzhi Li. Recording
-
[31.08.2023] Peter presents Bayesian Design Principles for Frequentist Sequential Learning by Yunbei Xu, Assaf Zeevi Recording
-
[24.08.2023] Inneke presents Tree of Thoughts: Deliberate Problem Solving with Large Language Models by S Yao, D Yu, J Zhao, I Shafran, T Griffiths, Y Cao, K Narasimhan Recording
-
[17.08.2023] Gerard presents QLoRA: Efficient Finetuning of Quantized LLMs by Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer Recording
-
[10/08/2023] Ben Steer and Naomi Arnold from present how Pometry uses Temporal Graph Motifs to study bitcoin darkweb market places and NFT wash trading.
-
[03/08/2023] Vahan Presents Continual Pre-training of Language Models by Zixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim, Bing Liu
-
[27/07/2023] Peter presents FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness By Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
-
[20/07/2023] Arvid presents TrueSkill: A Bayesian skill rating system By Ralf Herbrich, Tom Minka, Thore Graepel
-
[13/07/2023] Arvid presents Expectation Propagation for Approximate Bayesian Inference By Thomas P Minka
-
[06/07/2023] Arvid presents Factor Graphs and the Sum-Product Algorithm By Frank R. Kschischang, Brendan J. Frey, and Hans-Andrea Loeliger
-
[29/06/2023] Peter presents Faster sorting algorithms discovered using deep reinforcement learning By Daniel J. Mankowitz, Andrea Michi, Anton Zhernov, Marco Gelmi, Marco Selvi, Cosmin Paduraru, Edouard Leurent, Shariq Iqbal, Jean-Baptiste Lespiau, Alex Ahern, Thomas Köppe, Kevin Millikin, Stephen Gaffney, Sophie Elster, Jackson Broshear, Chris Gamble, Kieran Milan, Robert Tung, Minjae Hwang, Taylan Cemgil, Mohammadamin Barekatain, Yujia Li, Amol Mandhane, Thomas Hubert, David Silver
-
[23/06/2023] Vahan presents DensePose From WiFi
-
[15/06/2023] Gerard presents Bytes Are All You Need: Transformers Operating Directly On File Bytes
-
[01/06/2023] Peter presents Improving language models by retrieving from trillions of tokens
-
[25/05/2023] Ben presents Generative Diffusion Models on Graphs: Methods and Applications
-
[11/05/2023] Vahan presents an outstanding paper award winner from ICLR 2023 Rethinking the Expressive Power of GNNs via Graph Biconnectivity
-
[04/05/2023] Gerard presents an outstanding paper award winner from ICLR 2023. Emergence of Maps in the Memories of Blind Navigation Agents
-
[20/04/2023] Peter presents Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi1, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick
-
[13/04/2023] Vahan presents: Knowledge and topology: A two layer spatially dependent graph neural networks to identify urban functions with time-series street view image by Yan Zhang, Pengyuan Liu, Filip Biljecki
-
[06/04/2023] Gerard presents Anomaly Detection in Multiplex Dynamic Networks: from Blockchain Security to Brain Disease Prediction By Ali Behrouz, Margo Seltzer
-
[23/03/2023] Peter presents Graph Neural Networks for Link Prediction with Subgraph Sketching By Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M. Bronstein, Max Hansmire
-
[16/03/2023] Vahan presents Hierarchical Text-Conditional Image Generation with CLIP Latents By Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
-
[09/03/2023] Peter presents ZeRO: Memory Optimizations Toward Training Trillion Parameter Models By Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He
-
[23/02/2023] Inneke presents Temporal Cycle-Consistency Learning by Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
-[09/02/2023] Peter Presents Zero-shot Causal Learning By Hamed Nilforoshan, Michael Moor, Yusuf Roohani, Yining Chen, Anja Šurina, Michihiro Yasunaga, Sara Oblak, Jure Leskovec
-
[26/01/2023] Dan presenting Mad Max: Affine Spline Insights into Deep Learning
-
[12/01/2023] Peter will present Gradient Descent: The Ultimate Optimizer by Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley, Erik Meijer
-
[05/01/2023] Peter will present Flamingo: a Visual Language Model for Few-Shot Learning by Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan Recording
-
[17/12/2022] Vahan presents Expander Graph Propagation by Andreea Deac, Marc Lackenby and Petar Veličković.
-
[08/12/2022] Hosted by Data Spartan. James will present Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
-
[01/12/2022] Peter presents Lucas Maystre, Daniel Russo Temporally-Consistent Survival Analysis
-
[24/11/2022] Arvid presents: Javier Fernández, Luke Bornn SoccerMap: A Deep Learning Architecture for Visually-Interpretable Analysis in Soccer
-
[17/11/2022] Dirk presents: Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
-
[10/11/2022] Peter presents: Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis & Pushmeet Kohli Discovering faster matrix multiplication algorithms with reinforcement learning
-
[03/11/2022] Nayef presents: Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, Animashree Anandkumar FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators
-
[27/10/2022] Peter presents: Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida Spectral Normalization for Generative Adversarial Networks
-
[20/10/2022] Vahan presents: Francesco Di Giovanni, James Rowbottom, Benjamin P. Chamberlain, Thomas Markovich, Michael M. Bronstein Graph Neural Networks as Gradient Flows: understanding graph convolutions via energy
-
[13/10/2022] Vahan presents: Ziang Chen, Jialin Liu, Xinshang Wang, Jianfeng Lu, Wotao Yin On Representing Linear Programs by Graph Neural Networks
-
[06/10/2022] Ben presents: Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole Score-Based Generative Modeling through Stochastic Differential Equations
-
[29/09/2022] Gerard presents: Guy Dar, Mor Geva, Ankit Gupta, Jonathan Berant Analyzing Transformers in Embedding Space
-
[15/09/2022] Nayef presents: Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli Deep Unsupervised Learning using Nonequilibrium Thermodynamics
-
[08/09/2022] Vahan presents: Li Jing, Pascal Vincent, Yann LeCun, Yuandong Tian UNDERSTANDING DIMENSIONAL COLLAPSE IN CONTRASTIVE SELF-SUPERVISED LEARNING
-
[01/09/2022] Tara presents: Qiao, W., Zhao, Y., Xu, Y., Lei, Y., Wang, Y., Yu, S. and Li, H. Deep learning-based pixel-level rock fragment recognition during tunnel excavation using instance segmentation model. If you have any requests about the paper, please email vahanATnplan.io
-
[25/08/2022] Peter presents: Thomas Muller, Alex Evans, Christoph Schied, Alexander Keller Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
-
[18/08/2022] Vahan presents: A deep graph neural network architecture for modelling spatio-temporal dynamics in resting-state functional MRI data
-
[11/08/2022] Dirk presents: S. Scott, A. Blocker, F. Bonassi, H. Chipman, E. George, and R. McCulloch Bayes and Big Data: The Consensus Monte Carlo Algorithm
-
[04/08/2022] Peter presents: Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta Understanding Dataset Difficulty with V-Usable Information (One of ICML's 2022 outstanding papers)
-
[28/07/2022] Vahan presents: Stéphane D’Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, Francois Charton Deep symbolic regression for recurrence prediction
-
[21/07/2022] Nayef presents: Moshe Eliasof, Eldad Haber, Eran Treister PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations
-
[14/07/2022] Peter presents: Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, Hongyuan Zha DyRep: Learning Representations over Dynamic Graphs
-
[07/07/2022] Vahan presents: Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R Devon Hjelm Deep Graph Infomax
-
[29/06/2022] Peter presents: Melih Kandemir, Abdullah Akgül, Manuel Haussmann, Gozde Unal Evidential Turing Processes
-
[22/06/2022] Vahan presents: Carlos Fernandez-Loria, Foster Provost Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters
-
[16/06/2022] Peter presents: Albert Gu, Karan Goel, and Christopher Ré Efficiently Modeling Long Sequences with Structured State Spaces
-
[09/06/2022] Vahan presents Newton vs the machine: solving the chaotic three-body problem using deep neural networks
-
[26/05/2022] Arvid Presents: Xun Zheng, Bryon Aragam, Pradeep Ravikumar, Eric P. Xing DAGs with NO TEARS: Continuous Optimization for Structure Learning
-
[19/05/2022] Peter Presents: Rebekka Burkholz, Nilanjana Laha, Rajarshi Mukherjee, Alkis Gotovos On the Existence of Universal Lottery Tickets
-
[12/05/2022] Vahan Presents: Bertrand Charpentier, Simon Kibler, Stephan Günnemann Differentiable DAG Sampling
-
[05/05/2022] Peter Presents: Shengjia Zhao, Abhishek Sinha, Yutong He, Aidan Perreault, Jiaming Song, Stefano Ermon Comparing Distributions by Measuring Differences that Affect Decision Making
-
[04/28/2022] Ben Presents: Viacheslav Borovitskiy, Iskander Azangulov, Alexander Terenin, Peter Mostowsky, Marc Deisenroth, Nicolas Durrande Matern Gaussian Processes on Graphs
-
[04/07/2022] Peter leads a discussion on what is AI and what are its social and economic impacts. Please watch the following video by Jerry Kaplan Humans need not apply and read the following paper by John Searle The Chinese Room
-
[04/07/2022] Vahan presents: Yasaman Razeghi, Robert L. Logan IV, Matt Gardner, Sameer Singh (2022) Impact of Pretraining Term Frequencies on Few-Shot Reasoning
-
[04/07/2022] Vahan presents: Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, Jian Tang (2021) GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation
-
[03/31/2022] Peter presents: Chao Ma, Cheng Zhang (2021) Identifiable Generative Models for Missing Not at Random Data Imputation
-
[03/24/2022] Arvid presents: Judea Pearl On Measurement Bias in Causal Inference
-
[03/17/2022] Peter presents: Brian D. Ziebart, Andrew Maas, J.Andrew Bagnell, and Anind K. Dey Maximum Entropy Inverse Reinforcement Learning
-
[03/10/2022] Inneke presents: David Cohn, Les Atlas, Richard Ladner Improving Generalization with Active Learning
-
[03/03/2022] Vahan presents: Beatrice Bevilacqua, Fabrizio Frasca, Derek Lim, Balasubramaniam Srinivasan, Chen Cai, Gopinath Balamurugan, Michael M. Bronstein, Haggai Maron Equivariant Subgraph Aggregation Networks
-
[24/02/2022]Inneke presents: Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip H.S. Torr, Yarin Gal Deep Deterministic Uncertainty: A Simple Baseline
-
[17/02/2022] Peter presents: Chelsea Finn, Pieter Abbeel, Sergey Levine (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
-
[10/02/2022] Peter presents: Sebastien Thrun (1995). Is Learning The n-th Thing Any Easier Than Learning The First?
-
[03/02/2022] Peter presents: Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio (2021). Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation.
-
[27/01/2022] Arvid presents: Marius Muja, David G. Lowe (2014). Scalable Nearest Neighbor Algorithms for High Dimensional Data.
-
[20/01/2022] Dwane presents: Paul J. Blazek & Milo M. Lin (2021). Explainable neural networks that simulate reasoning.
-
[13/01/2022] Sagar presents: Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman (2021). Open-Set Recognition: A Good Closed-Set Classifier is All You Need.
-
[06/01/2022] Peter presents: Deng-Bao Wang, Lei Feng, Min-Ling Zhang (2021). Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence.
-
[09/12/2021] Peter presents: Gregory Clark (2021). Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess.
-
[02/12/2021] Vahan presents: Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2020). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model.
-
[25/11/2021] Joao presents: Keyulu Xu, Mozhi Zhang, Jingling Li, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka (2020). How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks.
-
[18/11/2021] Peter presents: Rico Jonschkowski, Divyam Rastogi, Oliver Brock (2018). Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors.
-
[11/11/2021] Vahan presents: Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, Jure Leskovec (2019). GNNExplainer: Generating Explanations for Graph Neural Networks.
-
[04/11/2021] Inneke presents: Sören Mindermann, Muhammed Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal (2021). Prioritized training on points that are learnable, worth learning, and not yet learned.
-
[28/10/2021] Joao presents: Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira (2021). Perceiver IO: A General Architecture for Structured Inputs & Outputs.
-
[21/10/2021] Peter presents: L Liu, M Hughes, S Hassoun, L Liu (2021). Stochastic Iterative Graph Matching.
-
[14/10/2021] Arvid presents: J Ma, B Chang, X Zhang, Q Mei (2021). CopulaGNN: Towards Integrating Represntational and Correlatioonal Roles of Graphs in Graph Neural Networks.
-
[07/10/2021] Vahan presents: X Chen, X Han, J Hu, F Ruiz, L Liu (2021). Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation.
-
[02/09/2021] Joao presents: James Thorne et al. (2021). Database Reasoning Over Text. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 3091–3104, August 1–6, 2021.
-
[26/08/2021] Ben presents: Emilien Dupont, Arnaud Doucet, Yee Whye Teh (2019). Augmented Neural ODEs. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019).
-
[19/08/2021] Arvid presents: Tan, M. & Le, Q.. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6105-6114.
-
[12/08/2021] Jiameng presents: Nanxin Chen et al. (2020). WaveGrad: Estimating Gradients for Waveform Generation. arXiv preprint arXiv:2009.00713. AND Yang Song, Stefano Ermon (2019). Generative Modeling by Estimating Gradients of the Data Distribution. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019).
-
[05/08/2021] Inneke presents: Wanyu Lin, Hao Lan, Baochun Li (2021). Generative Causal Explanations for Graph Neural Networks arXiv preprint arXiv:2104.06643.
-
[29/07/2021] Vahan presents: Li, G., Müller, M., Ghanem, B., & Koltun, V. (2021). Training Graph Neural Networks with 1000 Layers. arXiv preprint arXiv:2106.07476.
-
[22/07/2021] Peter presents: Jesson, A., Mindermann, S., Gal, Y., & Shalit, U. (2021). Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding. arXiv preprint arXiv:2103.04850.
-
[15/07/2021] Joao presents: Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Tom Rainforth, Yarin Gal (2021). Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning arXiv preprint arXiv:2106.02584.
-
[08/07/2021] Peter presents: Ghifary, M., Kleijn, W. B., Zhang, M., Balduzzi, D., & Li, W. (2016, October). Deep reconstruction-classification networks for unsupervised domain adaptation. In European conference on computer vision (pp. 597-613). Springer, Cham.
-
[01/07/2021] Peter presents: Louizos, C., Shalit, U., Mooij, J., Sontag, D., Zemel, R., & Welling, M. (2017). Causal effect inference with deep latent-variable models. arXiv preprint arXiv:1705.08821.
-
[24/06/2021] Inneke presents: Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021). Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092.
-
[17/06/2021] Arvid presents: Arik, S. O., & Pfister, T. (2019). Tabnet: Attentive interpretable tabular learning. arXiv preprint arXiv:1908.07442.
-
[10/06/2021] Carlos presents: Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., ... & Mordatch, I. (2021). Decision Transformer: Reinforcement Learning via Sequence Modeling. arXiv preprint arXiv:2106.01345.
-
[03/06/2021] Vahan presents: Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou (2021). ResMLP: Feedforward networks for image classification with data-efficient training. arXiv preprint arXiv:2105.03404.
-
[27/05/2021] Joao presents: Curtis G. Northcutt, Lu Jiang, Isaac L. Chuang (2019). Confident Learning: Estimating Uncertainty in Dataset Labels arXiv preprint arXiv:1911.00068.
-
[20/05/2021] Peter presents: Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). MLP-Mixer: An all-MLP Architecture for Vision. arXiv preprint arXiv:2105.01601.
-
[13/05/2021] Inneke presents: Oord, A. V. D., Vinyals, O., & Kavukcuoglu, K. (2017). Neural discrete representation learning. arXiv preprint arXiv:1711.00937.
-
[06/05/2021] Peter presents: Sahoo, S., Lampert, C., & Martius, G. (2018, July). Learning equations for extrapolation and control. In International Conference on Machine Learning (pp. 4442-4450). PMLR.
-
[29/04/2021] Peter presents: Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., ... & Zaremba, W. (2017). Hindsight experience replay. arXiv preprint arXiv:1707.01495.
-
[22/04/2021] Jiameng presents: Bai, S., Kolter, J. Z., & Koltun, V. (2019). Deep equilibrium models. arXiv preprint arXiv:1909.01377.
-
[15/04/2021] Peter presents: Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., & Bengio, Y. (2021). Toward Causal Representation Learning. Proceedings of the IEEE.
-
[08/04/2021] Carlos presents: Jiang, Y., Chang, S., & Wang, Z. (2021). Transgan: Two transformers can make one strong gan. arXiv preprint arXiv:2102.07074.
-
[01/04/2021] Inneke presents: Jaegle, A., Gimeno, F., Brock, A., Zisserman, A., Vinyals, O., & Carreira, J. (2021). Perceiver: General Perception with Iterative Attention. arXiv preprint arXiv:2103.03206.
-
[25/03/2021] Vahan presents: He, P., Liu, X., Gao, J., & Chen, W. (2020). Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
-
[18/03/2021] Alexandre presents: Khrulkov, V., Mirvakhabova, L., Ustinova, E., Oseledets, I., & Lempitsky, V. (2020). Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6418-6428).
-
[11/03/2021] Arvid presents: Yeh, C. C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., ... & Keogh, E. (2016, December). Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1317-1322). Ieee.
-
[04/03/2021] Dwane presents: Chen, Z., Bei, Y., & Rudin, C. (2020). Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12), 772-782.
-
[25/02/2021] João presents: Pruthi, G., Liu, F., Sundararajan, M., & Kale, S. (2020). Estimating Training Data Influence by Tracing Gradient Descent. arXiv preprint arXiv:2002.08484.
-
[18/02/2021] Amin presents: Xiong, Y., Zeng, Z., Chakraborty, R., Tan, M., Fung, G., Li, Y., & Singh, V. (2021). Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention. arXiv preprint arXiv:2102.03902.
-
[11/02/2021] Inneke presents: Zhang, M., & He, Y. (2020). Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping. arXiv preprint arXiv:2010.13369.
-
[04/02/2021] Carlos presents: Brown, N., Bakhtin, A., Lerer, A., & Gong, Q. (2020). Combining deep reinforcement learning and search for imperfect-information games. arXiv preprint arXiv:2007.13544.
-
[28/01/2021] Amin presents: Kong, L., d'Autume, C. D. M., Ling, W., Yu, L., Dai, Z., & Yogatama, D. (2019). A mutual information maximization perspective of language representation learning. arXiv preprint arXiv:1910.08350.
-
[21/01/2021] João presents: Haidar, M. A., & Rezagholizadeh, M. (2019, May). Textkd-gan: Text generation using knowledge distillation and generative adversarial networks. In Canadian Conference on Artificial Intelligence (pp. 107-118). Springer, Cham.
-
[14/01/2021] Dwane presents: Bartolo, M., Roberts, A., Welbl, J., Riedel, S., & Stenetorp, P. (2020). Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension. arXiv preprint arXiv:2002.00293.
-
[10/12/2020] Carlos presents: Huang, Q., He, H., Singh, A., Lim, S. N., & Benson, A. R. (2020). Combining Label Propagation and Simple Models Out-performs Graph Neural Networks. arXiv preprint arXiv:2010.13993.
-
[03/12/2020] Arvid presents: Lim, B., Arik, S. O., Loeff, N., & Pfister, T. (2019). Temporal fusion transformers for interpretable multi-horizon time series forecasting. arXiv preprint arXiv:1912.09363.
-
[19/11/2020] Dan presents: Zhang, J., Shi, X., Xie, J., Ma, H., King, I., & Yeung, D. Y. (2018). Gaan: Gated attention networks for learning on large and spatiotemporal graphs. arXiv preprint arXiv:1803.07294.
-
[12/11/2020] Amin presents: Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.
-
[05/11/2020] Vahan presents: Rong, Y., Bian, Y., Xu, T., Xie, W., Wei, Y., Huang, W., & Huang, J. (2020). GROVER: Self-supervised Message Passing Transformer on Large-scale Molecular Data. arXiv preprint arXiv:2007.02835.
-
[29/10/2020] Carlos presents: Paper under double-blind review. Lambda networks: modeling long-range interactions without attention.. ICLR 2021.
-
[22/10/2020] Dan presents: Doersch, C., Gupta, A., & Zisserman, A. (2020). CrossTransformers: spatially-aware few-shot transfer. arXiv preprint arXiv:2007.11498.
-
[15/10/2020] Vahan presents: Vyas, A., Katharopoulos, A., & Fleuret, F. (2020). Fast Transformers with Clustered Attention. arXiv preprint arXiv:2007.04825.
- Blog post and code for the paper
- Further reading on efficient transformers: The Reformer
-
[09/10/2020] Joao presents: Under double-blind review. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
-
[01/10/2020] João presents: Swayamdipta, S. et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics. arXiv preprint 2009.10795.
-
[25/09/2020] Amin presents: Cordonnier, J. B., Loukas, A., & Jaggi, M. (2019). On the relationship between self-attention and convolutional layers. arXiv preprint arXiv:1911.03584.
-
[17/07/2020] Arvid presents: Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., & Ré, C. (2017, November). Snorkel: Rapid training data creation with weak supervision. In Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases (Vol. 11, No. 3, p. 269). NIH Public Access.
-
[10/09/2020] Carlos presents: Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
-
[03/09/2020] Vahan presents: Lee, H., Hwang, S. J., & Shin, J. Self-supervised Label Augmentation via Input Transformations.. Supporting material: Yann LeCun speaks about self supervised learning
-
[27/08/2020] Dan presents: Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le (2019). Unsupervised Data Augmentation for Consistency Training. arXiv preprint arXiv:1904.12848.
-
[20/08/2020] Inneke presents: Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton (2020). Big Self-Supervised Models are Strong Semi-Supervised Learners. arXiv preprint arXiv:2006.10029.
-
[13/08/2020] Joao presents: Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In International Conference on Learning Representations 2020.
-
[06/08/2020] Amin presents: Yun, S., Jeong, M., Kim, R., Kang, J., & Kim, H. J. (2019). Graph transformer networks. In Advances in Neural Information Processing Systems (pp. 11983-11993).
-
[30/07/2020] Slides Krisztina presents: Kohl, S., Romera-Paredes, B., Meyer, C., De Fauw, J., Ledsam, J. R., Maier-Hein, K., ... & Ronneberger, O. (2018). A probabilistic u-net for segmentation of ambiguous images. In Advances in Neural Information Processing Systems (pp. 6965-6975).
-
[23/07/2020] Vahan presents: Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. (2020). Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. arXiv preprint arXiv:2006.16236.
-
[16/07/2020] Dan presents: Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
-
[09/07/2020] Amy presents: Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
-
[02/07/2020] Joao presents: Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).
-
[25/06/2020] Carlos presents: Wang, X., Huang, T. E., Darrell, T., Gonzalez, J. E., & Yu, F. (2020). Frustratingly Simple Few-Shot Object Detection. arXiv preprint arXiv:2003.06957.
-
[18/06/2020] Vahan presents: Zhang, J., Kailkhura, B., & Han, T. (2020). Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning. arXiv preprint arXiv:2003.07329.
-
[11/06/2020] Slides Arvid presents: Schoenholz, S. S., Gilmer, J., Ganguli, S., & Sohl-Dickstein, J. (2016). Deep information propagation. arXiv preprint arXiv:1611.01232.
-
[05/06/2020] Dan presents: Snoek, J., Ovadia, Y., Fertig, E., Lakshminarayanan, B., Nowozin, S., Sculley, D., ... & Nado, Z. (2019). Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems (pp. 13969-13980).
-
[28/05/2020] Amy presents: Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., & Lillicrap, T. (2017). A simple neural network module for relational reasoning. In Advances in neural information processing systems (pp. 4967-4976).
-
[21/05/2020] Joao presents: Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky (2020). Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One In International Conference on Learning Representations 2020.
-
[14/05/2020] Carlos presents: Malinin, A., & Gales, M. (2018). Predictive uncertainty estimation via prior networks. In Advances in Neural Information Processing Systems (pp. 7047-7058).
-
[30/04/2020] Dan presents: Garnelo, M., Schwarz, J., Rosenbaum, D., Viola, F., Rezende, D. J., Eslami, S. M., & Teh, Y. W. (2018). Neural processes. arXiv preprint arXiv:1807.01622.
-
[23/04/2020] Amy presents: Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., ... & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
-
[15/04/2020] Joao presents: Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J. E., & Weinberger, K. Q. (2017). Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109.
-
[09/04/2020] Carlos presents: Ashukha, A., Lyzhov, A., Molchanov, D., & Vetrov, D. (2020). Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning. arXiv preprint arXiv:2002.06470.
-
[05/03/2020] Vahan presents: Haber, E., Ruthotto, L., Holtham, E., & Jun, S. H. (2018, April). Learning Across Scales---Multiscale Methods for Convolution Neural Networks. In Thirty-Second AAAI Conference on Artificial Intelligence.
-
[27/02/2020] Arvid presents: Wilson, A. G., Hu, Z., Salakhutdinov, R., & Xing, E. P. (2016, May). Deep kernel learning. In Artificial Intelligence and Statistics (pp. 370-378).
-
[20/02/2020] Arvid presents: Wilson, A., & Nickisch, H. (2015, June). Kernel interpolation for scalable structured Gaussian processes (KISS-GP). In International Conference on Machine Learning (pp. 1775-1784).
-
[13/02/2020] Arvid presents: Wilson, A. G., Knowles, D. A., & Ghahramani, Z. (2011). Gaussian process regression networks. arXiv preprint arXiv:1110.4411.
-
[06/02/2020] Joao presents: Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In Advances in neural information processing systems (pp. 3856-3866).
-
[30/01/2020] Carlos presents: Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in neural information processing systems (pp. 4765-4774).
-
[23/01/2020] Carlos presents: Ribeiro, M. T., Singh, S., & Guestrin, C. (2018, April). Anchors: High-precision model-agnostic explanations. In Thirty-Second AAAI Conference on Artificial Intelligence.
-
[16/01/2020] Joao presents: Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM.
-
[12/12/2019] Vahan presents: Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C. (2019). Mixmatch: A holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249.
-
[05/12/2019] Gary presents: Dozat, T. (2016). Incorporating nesterov momentum into adam.
-
[28/11/2019] Joao presents: Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017, August). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1321-1330). JMLR. org.
-
[21/11/2019] Carlos presents: Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3), 61-74.
-
[14/11/2019] Vahan presents: Kendall, A., & Cipolla, R. (2016, May). Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE international conference on Robotics and Automation (ICRA) (pp. 4762-4769). IEEE.
-
[07/11/2019] Gary presents: Cobb, A. D., Roberts, S. J., & Gal, Y. (2018). Loss-calibrated approximate inference in Bayesian neural networks. arXiv preprint arXiv:1805.03901.
-
[31/10/2019] Arvid presents (slides): Chapelle, Olivier, and Lihong Li. "An empirical evaluation of thompson sampling." Advances in neural information processing systems. 2011.
-
[24/10/2019] Ivan presents: Chelombiev, I., Houghton, C., & O'Donnell, C. (2019). Adaptive estimators show information compression in deep neural networks. arXiv preprint arXiv:1902.09037.
-
[10/10/2019] Ivan presents: Saxe, A. M., Bansal, Y., Dapello, J., Advani, M., Kolchinsky, A., Tracey, B. D., & Cox, D. D. (2018). On the information bottleneck theory of deep learning.
-
[03/10/2019] Ivan presents: Shwartz-Ziv, R., & Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810.
-
[26/09/2019] Carlos presents (with demo): Cheng, H. T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., ... & Anil, R. (2016, September). Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems (pp. 7-10). ACM.
-
[19/09/2019] Alan presents: Mosca, A., & Magoulas, G. D. (2018). Distillation of deep learning ensembles as a regularisation method. In Advances in Hybridization of Intelligent Methods (pp. 97-118). Springer, Cham.
-
[12/09/2019] Carlos presents: Papernot, N., McDaniel, P., Wu, X., Jha, S., & Swami, A. (2016, May). Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP) (pp. 582-597). IEEE.
-
[05/09/2019] Carlos presents: Frosst, N., & Hinton, G. (2017). Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784.
-
[29/08/2019] Alan presents: Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
-
[22/08/2019] Gary presents: Lee, J., Lee, I., & Kang, J. (2019). Self-Attention Graph Pooling. arXiv preprint arXiv:1904.08082.
-
[15/08/2019] Vahan presents: Yao, L., Mao, C., & Luo, Y. (2019, July). Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 7370-7377).
-
[08/08/2019] Carlos presents: Wu, F., Zhang, T., Souza Jr, A. H. D., Fifty, C., Yu, T., & Weinberger, K. Q. (2019). Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153.
-
[25/07/2019] Arvid presents: Enßlin, T. A., Frommert, M., & Kitaura, F. S. (2009). Information field theory for cosmological perturbation reconstruction and nonlinear signal analysis. Physical Review D, 80(10), 105005.
-
[18/07/2019] Gary presents: Zhang, G., Wang, C., Xu, B., & Grosse, R. (2018). Three mechanisms of weight decay regularization. arXiv preprint arXiv:1810.12281.
-
[11/07/2019] Auke presents: Oord, A. V. D., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
-
[04/07/2019] François presents: Kool, W., van Hoof, H., & Welling, M. (2019). Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement. arXiv preprint arXiv:1903.06059.
-
[20/06/2019] Vahan presents: Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
-
[13/06/2019] Alessio presents: Dobriban, E., & Liu, S. (2018). A new theory for sketching in linear regression. arXiv preprint arXiv:1810.06089.
-
[06/06/2019] François presents: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
-
[30/05/2019] Arvid presents: Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5), 1299-1319.
-
[23/05/2019] Auke presents: Alaa, A. M., & van der Schaar, M. (2018). Autoprognosis: Automated clinical prognostic modeling via bayesian optimization with structured kernel learning. arXiv preprint arXiv:1802.07207.
-
[16/05/2019] Carlos presents: Dhamija, A. R., Günther, M., & Boult, T. (2018). Reducing Network Agnostophobia. In Advances in Neural Information Processing Systems (pp. 9175-9186).
-
[09/05/2019] Naman presents: Geifman, Y., & El-Yaniv, R. (2017). Selective classification for deep neural networks. In Advances in neural information processing systems (pp. 4878-4887).
-
[02/05/2019] Gary presents: Gal, Y., & Ghahramani, Z. (2016, June). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050-1059).
-
[25/04/2019] Vahan presents: Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (pp. 6402-6413).
-
[18/04/2019] Vahan presents: Vyas, A., Jammalamadaka, N., Zhu, X., Das, D., Kaul, B., & Willke, T. L. (2018). Out-of-distribution detection using an ensemble of self supervised leave-out classifiers. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 550-564).
-
[11/04/2019] Carlos presents: Bendale, A., & Boult, T. E. (2016). Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1563-1572).
-
[04/04/2019] Arvid presents: Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., ... & Sabeti, P. C. (2011). Detecting novel associations in large data sets. science, 334(6062), 1518-1524.
-
[28/03/2019] Joao presents: Chen, B., Medini, T., & Shrivastava, A. (2019). SLIDE: In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems. arXiv preprint arXiv:1903.03129.
-
[21/03/2019] Joao presents: Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio.. arXiv preprint.
-
[14/03/2019] Vahan presents: Wright, J., Ganesh, A., Rao, S., Peng, Y., & Ma, Y. (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Advances in neural information processing systems (pp. 2080-2088).
-
[07/03/2019] Vahan presents: Candes, E. J., Romberg, J. K., & Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8), 1207-1223.
-
[28/02/2019] Arvid presents: Dietterich, T. G., & Bakiri, G. (1994). Solving multiclass learning problems via error-correcting output codes. Journal of artificial intelligence research, 2, 263-286.
-
[21/02/2019] Gary presents: Mnih, A., & Kavukcuoglu, K. (2013). Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information processing systems (pp. 2265-2273).
-
[14/02/2019] Carlos presents: Ziko, I., Granger, E., & Ayed, I. B. (2018). Scalable Laplacian K-modes. In Advances in Neural Information Processing Systems (pp. 10062-10072).
-
[07/02/2019] Carlos presents: Wang, W., & Carreira-Perpinán, M. A. (2014). The Laplacian K-modes algorithm for clustering. arXiv.
-
[31/01/2019] Gary presents: Hoffer, E., Hubara, I., & Soudry, D. (2017). Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In Advances in Neural Information Processing Systems (pp. 1731-1741).
-
[24/01/2019] Alessio presents: McInnes, L., & Healy, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
-
[17/01/2019] Chris presents: Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research.
-
[10/01/2019] Carlos presents: Chen, T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural Ordinary Differential Equations. arXiv:1806.07366.
-
[20/12/2018] Gary presents: Wilson, A. C., Roelofs, R., Stern, M., Srebro, N., & Recht, B. (2017). The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems.
-
[13/12/2018] Carlos presents: Lin, H., & Jegelka, S. (2018). ResNet with one-neuron hidden layers is a Universal Approximator. In Advances in Neural Information Processing Systems.
-
[06/12/2018] Auke presents: Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2018). Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9446-9454).
-
[29/11/2018] Vahan presents: Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2016). Understanding deep learning requires rethinking generalization. arXiv:1611.03530.
-
[22/11/2018] Gary presents: Smith, S. L., Kindermans, P. J., Ying, C., & Le, Q. V. (2017). Don't decay the learning rate, increase the batch size. arXiv:1711.00489.
-
[15/11/2018] Joao presents: Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271.
-
[01/11/2018] Vahan presents: Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences.
-
[18/10/2018] Carlos presents: Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.
-
[11/10/2018] dos Santos, C., & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers.