Here you can find educational/explanatory material written by me (Chris Wymant), pitched at various levels from student to researcher. My publications are here, my home page is here, I occassionally post on Bluesky here.
There's lots of material below organised by topic. One thing to highlight is my advice for writing papers in academic science. I think that improvement there is low-hanging fruit: communicating our work is an important and undervalued part of our work.
Here are video recordings of talks I've given on various topics:
- pathogen sequence analysis here,
- our discovery of a highly virulent variant of HIV here (or with an introduction in French here),
- our proposal and evaluation of digital contact tracing for SARS-CoV-2 here,
- the interplay of proximity and duration of exposure for SARS-CoV-2 transmission here.
Here are audio recordings:
- discussing our discovery of a highly virulent variant of HIV with BBC World Service here, starting 1 min 11 secs into the programme
- discussing infectious disease epidemiology and statistical modelling on the Learn Bayesian Statistics podcast, in a live edition recorded at StanCon 24, here (see other podcast listening options here)
Below, materials are organised by topic.
- A quiz testing the topics covered, and the answers
- One set of slides covering most of the material (the following slides go into a bit more detail)
- Basic manipulation of numbers - the slides for 2015 and prose-style notes from 2014
- Inequalities, functions and units - slides for 2015, solutions to the problems, and prose-style notes from 2014
- Probability parts one and two
- A bit of supplementary material about calculus and matrices
I wrote the above (for a Bachelors course, a masters course and 'the short course' for professionals) while employed by Imperial College London who claim copyright in such circumstances, therefore this material should not be re-used.
- Statistical Modelling (A very short introduction), here. A 2023 lecture for the University of Oxford Centre for Doctoral Training in Health Data Sciences. Code as a worked solution to the two parts of the practical is here and here
- For my 2025 lecture on that same course I focussed on what mathematical modelling and statistical modelling are at a high level, the basic laws of probability, a little about counterfactuals, and using random-effects (hierarchical/multi-level) models to describe CD4 cell decline during HIV infection. Lecture slides here, practical here, worked solutions to practical exercises are here and here for 1, here and here for 2 using this WHO data, here and here for 3. (For completeness, the 2024 slides are here but the 2025 ones are a slight improvement.)
- Inferring things from quantitative data, or why it's better to think less about doing things to data, here
- The need for hierarchical models to infer things from naturally grouped data: here. Code for the example is the hierarchical model in the Stan section below.
- Some basic basics about causal inference here
- Kind of inference, though mostly an exercise in R plotting code: how to plot survival curves for Kaplain-Meier estimators and the Cox proportional hazards model here (the underlying R markdown file is here).
- Why I prefer Bayesianism to Frequentism for inference here
- Gelman proposed a useful folk theorem of statistical computing. A minor proposed addition to this: If your statistical model is failing to infer the known parameters values used to simulate data, it's 50:50 that the problem is with your simulation rather than your statistical model. (I usually forget this and focus on debugging my inference code, 50% of the time unnessecarily.)
- If you are using a Bayesian statistical model to explore some parameters numerically, while also analytically marginalising over some parameters (usually for computational efficiency), and you use a posterior predictive check for how well your model fits the data, a subtle point you can easily get wrong is described in detail here (the underlying R markdown file is here).
The Stan language for probabilistic programming (especially Bayesian inference):
- A list of places I know of for learning about Stan, both generally and in the context of infectious disease epidemiology, here
- A decision tree for which block you should declare parameters in here
- Censored data: a simple example of inference involving a likelihood with both probability density and probability mass, R code here and Stan code here
- Censored data: a more involved example of estimating the generating process for two correlated variables, either or both of which may be censored. The likelihood contains a mixture of probability density functions, and one- and two-dimensional integrals of these. Worked example here (the underlying R markdown file is here).
- When specifying uniform/range priors with the boundary of one prior depending on another parameter, e.g.
$x\sim\text{Uniform}(0,1)$ and$y|x\sim\text{Uniform}(0,x)$ , the way Stan behaves by default is probably not what you intended: explanation here (the underlying R markdown file is here). - Hierarchical/multi-level modelling: a simple example with R code here and Stan code here
- A similar example to the previous one, showing how a multi-level model for describing the difference between many groups does better than sequentially testing each group being different than the rest here (the underlying R markdown file is here).
- Probabilisitic classification: a simple example of classifying observations as having come from either one process or another ('signal' or 'noise') using a mixture model. R code here and Stan code here
- Ragged panel data: an example of fitting simple linear slopes to panel data (longitudinal data for each of several units) that is ragged (different number of observations per unit), with unit-specific random effects that we analytically marginalise over is here (the underlying R markdown file is here).
- The last bullet point in the 'Inference' section above has Stan code examples incorporated in it
- Slides about pathogen phylogenetic trees, and how to assemble viral genomes from high-throughout ('next-generation') genetic sequence data with shiver here (see also the shiver publication here)
- Slides about estimating who infected whom with phyloscanner here (see also the phyloscanner publication here)
- A webinar in which I talk through a subset of the slides from the above two lectures
- A computational practical showing how to use phyloscanner is here. Being taught this practical was apparently "like receiving piano lessons from Beethoven;" YMMV
- An accessible summary of our discovery of the VB variant of HIV is here, and a webinar on the subject is here. Virtually the same webinar but with an introduction in French is here.
- Slides here and a recorded webinar here about two of our group's papers on digital contact tracing (our initial proposal and our evaluation).
- A recorded webinar here about how transmission risk depends on the interplay between proximity and duration of exposure. I wrote a short summary here and a thread in our group twitter account here.
- A short summary of our paper on high-resolution real-time epidemic monitoring via digital contact tracing systems is here. I wrote a thread in our group twitter account here.
- A blog post explaining our group's paper modelling the effectiveness of daily lateral flow testing as an alternative strategy for reducing transmission from the traced close contacts of index cases.
- A quick introduction to the tidyverse (basically just to the dplyr package and the pipe operator) here
- more to come...
- Epidemics8 in 2021 here
- COVID-19: Advances and Remaining Challenges in 2021 here
- Virus Genomics and Evolution 2021 here
- Human Virus Dynamics and Evolution 2021 here
- Net Zero 2019 here
- Oberwolfach (the maths of infectious diseases) 2018 here
- Epidemics7 in 2017 here
- IAS 2017 here
- Mathematical and Computational Evolutionary Biology 2016 here
Advice on writing a scientific paper in academia here.
A glossary of HIV terms (mainly at the molecular and cellular level) from when I first started working on HIV.
Here are some bash commands (i.e. working with the terminal / command line) that I find helpful.
A simple example of changing an R-code for loop into parallelised command-line executions (e.g. for sending to a computational cluster), with a short discussion of the benefits of parallelising only partially instead of fully, is here.
A simple tip for handling responses to reviewers during peer review: as soon as you receive your reviews copy them all into a Google doc, put a differently formatted TODO between each separate point, then share this document with your coauthors. Replace the TODOs with your actual responses once you've addressed that comment. This is a handy way of tracking progress, noting things to remember, and discussing with coauthors how best to respond (use Google doc's comment feature and "@" people at specific places). Using this document even to read your your reviews for the first time (as opposed to reading the email they came in) means you can make a note of whatever first thoughts pop into your head. Example here.
Some explanations of things on twitter:
- On the harmful oversimplification of focussing only on the fraction of SARS-CoV-2 infections that become severe, here
- Lockdown pros and cons here
- Our group's agent-based model of SARS-CoV-2 epidemics and interventions here
- A summary of results obtained from that model here (summarising our report provided to NHSX here)
- Why science is easier than politics here
- A pet hate: academics leveraging/harnessing things left, right and centre instead of using them here
- Really funny jokes about the command line here and here
Acknowledgement: I wrote all the above materials while funded by ERC Advanced Grant PBDR-339251 and a Li Ka Shing Foundation grant, both awarded to Christophe Fraser.
- Read Strunk and White if you write in English (it's online here). Read Politics and the English Language if you write in order to make a point; some highlights are here.
- Read this if you give talks.
- Read this if you're a scientist using a computer.
- Here is a large collection of resources providing advice for graduate students, organised by topic
- Other people recommended these resources for learning to interact with your computer through the command line (a.k.a. the terminal a.k.a. the shell), which is very helpful for being able to use other people's computational scientific methods: here here and here
- Other people recommended this for learning version control with Git (aimed at users of R but with more general applicability), which is invaluable for writing your own scientific methods.
- This helps one remember obscure latex symbols.
- A talk on climate change and how we're fucking up the planet and nature generally here
- A summary of experts advocating taking to the streets for climate action here. Here's me doing so with students, doctors and other scientists, and happily talking to the police
- Suggestions for good twitter accounts to follow on climate, from when I used to have the bandwidth for that, here
- An explanation of the "5-1" system of service receive for volleyball here
- Really you made it this far? Maybe you'll like my lovingly curated playlists or vegan recipe meta-analyses