-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcvJupyter.qmd
589 lines (444 loc) · 20.1 KB
/
cvJupyter.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
---
jupyter: python3
format: html
execute:
echo: false
---
# Brian S Caffo
<hr>
| Professor |
|:--- |
| [Departments of Biostatistics](https://publichealth.jhu.edu/departments/biostatistics) [Johns Hopkins University](www.jhu.edu) (primary),|
| [Department of Biomedical Engineering](https://www.bme.jhu.edu/), [Johns Hopkins University](www.jhu.edu) (courtesy) |
| [www.bcaffo.com](www.bcaffo.com), [CV repo](https://github.com/bcaffo/cv), [CV hosted version](https://bcaffo.github.io/cv/cvJupyter.html) |
# Part I
## Summary
Brian Caffo, PhD is a professor in the Department of Biostatistics
with a secondary appointment in the Department of Biomedical
Engineering at Johns Hopkins University. He graduated from the
University of Florida Department of Statistics in 2001. He has worked
in statistical computing, statistical modeling, computational
statistics, multivariate and decomposition methods and statistics in
neuroimaging and neuroscience. He led teams that won the ADHD 200
prediction competition. He co-directs the SMART statistical
group. With other faculty at JHU, he created and co-directs the
Coursera Data Science Specialization, a 10 course specialization on
statistical data analysis. He co-directs the JHU Data Science Lab, a
group dedicated to open educational innovation and data science. He is
the former director of the Biostatistics graduate programs and
admissions committees. He is currently the co-director of the Johns
Hopkins High Performance Computing Exchange super computing service
center and past-president of the Bloomberg School of Public Health
faculty senate.
## Education and training
| Year | Description | Institution | |
|:--- |:--- | :--- | :--- |
| 2006 | K25 training grant | NIH | *A mentored training program in imaging science* |
| 2001 | PhD in statistics | U of Florida | *Candidate sampling schemes and some important applications* |
| 1998 | MS in statistics| U of Florida | |
| 1995 | Dual BS in mathematics and statistics | U of Florida | |
```{python}
import pandas as pd
import plotly.express as px
import numpy as np
import wordcloud as wc
import stylecloud as sc
import matplotlib.pyplot as plt
import os
from IPython.display import Image
import plotly.graph_objects as go
import plotly.io as pio
import itertools
from PIL import Image
## pio.renderers.default = "plotly_mimetype+notebook"
## pio.renderers.default = "plotly_mimetype+notebook_connected"
## This allows for pdf rendering
## pio.renderers.default = "plotly_mimetype+notebook+pdf"
## pio.kaleido.scope.mathjax = None
#static = True
static = False
#the default height and width
height = 400
width = 600
## Note to render to pdf do
## quarto render cvJupyter.qmd --to pdf
pd.set_option("display.max_rows", 999)
#dat = pd.read_csv("publications_01042022_2.csv")
#dat = pd.read_csv("publications_12072022.csv")
dat = pd.read_csv("publications_01152024.csv")
## Not sure why these columns changes.
## Here's the ones you need, reset these every year
dat = dat.rename(columns = {
'Authors' : 'Authors',
'Year' : 'Publication Year',
'Title' : 'Document Title',
'Source title' : 'Journal Title',
'Cited by' : 'Citations'
})
dat['Citations'] = dat['Citations'].fillna(0)
```
## Professional experience
Relevant professional experience.
```{python}
#bizarre, plotly works better if you write some random figure out first.
fig=px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])
fig.write_image("temp.pdf", format="pdf")
```
```{python}
profExp = pd.read_csv("profExp.txt", delimiter="|")
profExp['Start'] = pd.to_datetime("01/01/"+profExp['Start'].astype(str))
profExp['End'] = pd.to_datetime("12/31/"+profExp['End'].astype(str))
profExp = profExp.sort_values(by = ['Start', 'End'])
profExp = profExp.assign(Position=profExp['Title']+" "+profExp['Place'])
fig = px.timeline(profExp, x_start="Start", x_end="End",
y='Position',
color="Organization",
height= 400)
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.update_yaxes(autorange="reversed")
fig.show(warn = False)
```
## Profesional activities
| Year | Activity |
| :--- | :--- |
| 2005-2006 |Publications Officer for the Biometrics Section of the American Statistical Association |
| 2010 | Founding member Stat in Imaging ASA Section |
| 2010-2011 | Secretary Stat in Imaging ASA Section |
## Editorial activities
| Year | Activity |
| :--- | :--- |
| 2006-2008 | Associate editor Computational Statistics and Data Analysis |
| 2008-2010 | Associate editor for the Journal of the American Statistical Association |
| 2009-2012 | Associate editor for the Journal of the Royal Statistical Society Series B |
| 2010-2012 | Associate editor for Biometrics
| 2011-2011 | Senior program committee member for the Fourteenth International Conference on Artificial Intelligence and Statistics|
| 2016-2016 | Guest associate editor for Frontiers in Neuroscience special issues on Brain Imaging Methods
| 2021-2021 | Guest associate editor for Frontiers special issue in Explainable Artificial Intelligence in Healthcare and Finance |
I do NIH, EU, NSF ... ad hoc review panels whenever they ask me and I'm able to. This usually translates to say 3 or so a year. I review manuscripts for journals I like whenever they're relevant to my research expertise and I have done a few conference abstract reviews and chaired sessions for conferences I like.
## Honors and awards
| Year | Award |
| :--- | :--- |
| 1998 | William S. Mendenhall Award |
| 1999 | Anderson Scholar/Faculty nominee for the University of Florida CLAS |
| 2001 | University of Florida CLAS Dissertation Fellowship |
| 2001 | University of Florida Statistics Faculty Award |
| 2002 | Johns Hopkins Faculty Innovation Award |
| 2006 | Johns Hopkins Bloomberg School of Public Health AMTRA award |
| 2008 | Johns Hopkins Bloomberg School of Public Health Golden Apple teaching award |
| 2011 | Leader and organizer of the declared winning entry of the 2011 ADHD200 prediction competition |
| 2011 | Presidential Early Career Award for Scientists and Engineers (PECASE, 2010, awarded in 2011); *The highest honor bestowed by the United States government on science and engineering professionals in the early stages of their independent research careers* |
| 2014 | Named a Fellow of the American Statistical Association |
| 2015 | Special Invited Lecturer, European Meeting of Statisticians |
| 2022 | Adrienne Cupples award; *This annual award recognizes a biostatistician whose academic achievements reflect the contributions to teaching, research, and service exemplified by [Professor L. Adrienne Cupples](https://www.bu.edu/sph/news/articles/2022/l-adrienne-cupples-in-memoriam/)*
## Publications
Publications reported in Scopus as of 1/15/2024. My total number of Scopus publications is 257. Below is a plot of total publications by year where each small rectangle is a publication.
```{python}
## Create a temporary copy of the dataset and work with that
temp = dat
temp = temp.assign(Count = 1)
fig = px.bar(temp, x = 'Publication Year',
y = 'Count',
color = 'Document Title',
hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.update_layout(showlegend=False)
fig
```
Here are journals I publish in the most.
```{python}
temp = dat['Journal Title'].value_counts().reset_index()
temp = temp.rename(columns = {"index" : "Title", "Journal Title" : "Count"}).sort_values("Count", ascending =False)
temp['inplot'] = temp['Count'] > 5
temp = temp.merge(dat, left_on = 'Title',
right_on = 'Journal Title', how = 'left')
temp = temp[temp['Count'] > 5]
temp = temp.assign(Count = 1)
fig = px.bar(temp, x = 'Journal Title',
y = 'Count',
color = 'Document Title',
hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.update_layout(showlegend=False)
fig.show()
```
I have published with 793 coauthors (according to Scopus). Here is authors that I have had 7 or over manuscripts with.
```{python}
## Get just the author last names. Have to strip out the initials
text = [s.split(';') for s in dat['Authors']]
text = list(itertools.chain(*text))
authors = pd.DataFrame({'Author' : text}).value_counts().reset_index()
authors = authors.rename(columns = {0 : 'Count'})
authors = authors[authors['Count'] > 7]
authors = authors[~authors['Author'].str.contains('Caffo')]
authors = authors['Author']
## Create a dataframe with just the info we need
authorDF = dat.copy()
## Create a column for every author included
for author in authors:
authorDF[author] = authorDF['Authors'].str.contains(author)
## Get rid of rows where no author is included
authorDF = authorDF[authorDF[authors].any(axis=1)]
## Melt the dataframe so that each row is a manuscript and each column is an author
authorDF = authorDF.melt(id_vars = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'],
value_vars = authors,
var_name = 'Author',
value_name = 'Included')
authorDF['Count'] = 1
authorDF = authorDF[authorDF['Included'] == True]
authorDF['Last name'] = [name.split()[0] for name in authorDF['Author']]
authorDF.sort_values(by = ['Last name', 'Publication Year'], inplace = True)
fig = px.bar(authorDF, x = 'Last name',
y = 'Count',
color = 'Document Title',
hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.update_layout(showlegend=False)
fig.show()
```
Here's a plot of number of authors for each manuscript by my position.
```{python}
## Get a list of lists of last names
text = [s.split(';') for s in dat['Authors']]
lname= lambda namelist: [name.split()[0] for name in namelist]
text = [lname(x) for x in text]
authorno = [len(x) for x in text]
position = [x.index('Caffo') + 1 for x in text]
positionDF = pd.DataFrame({'# Authors' : authorno, 'Position' : position})
fig = px.scatter(positionDF.groupby(['# Authors', 'Position']).size().reset_index(name = 'Count'),
x = '# Authors',
y = 'Position',
color = 'Count',
size = 'Count')
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```
Here's the total citation counts of manuscripts plotted by year of publication.
```{python}
temp = dat
temp = temp.rename(columns = {'total' : 'Citations'})
fig = px.bar(temp,
x = 'Publication Year',
y = 'Citations',
color = 'Document Title',
hover_data = ['Publication Year', 'Citations', 'Document Title', 'Journal Title'])
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.update_layout(showlegend=False)
fig.show()
```
# Part II
## Teaching
### Advisees
To the nearest year from matriculation year. Includes advisees and co-advisees.
```{python}
advisees = pd.read_csv("Advisees.txt", sep = "|")
advisees['Start'] = pd.to_datetime("01/01/"+advisees['Start'].astype(str))
advisees['End'] = pd.to_datetime("12/31/"+advisees['End'].astype(str))
advisees = advisees.sort_values(by = ['Start', 'End'])
fig = px.timeline(advisees,
x_start="Start",
x_end="End",
y='Advisee',
color="Degree",
height=700,
hover_data = ['Title', 'Notes'])
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```
### Student exam participation
Excludes alternate.
```{python}
exams = pd.read_csv("exams.csv")
exams['Exam'] = exams['Exam'].str.strip()
exams['Department'] = exams['Department'].str.strip()
exams = exams[ ['Year', 'Department', 'Exam'] ].value_counts().reset_index()
exams = exams.rename(columns = {0 : 'Count'})
fig = px.bar(exams[ exams['Exam'] == "prelim"], x = "Year", y = "Count", color = "Department", title = "Prelim Exams / GBOs")
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
## Static image
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()
```
```{python}
fig = px.bar(exams[ exams['Exam'] == "final"], x = "Year", y = "Count", color = "Department", title = "Final Exams")
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
## Static
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()
```
```{python}
fig = px.bar(exams[ exams['Exam'] == "masters"],
x = "Year", y = "Count", color = "Department", title = "Masters reader")
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
## Static
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()
```
### Classroom Instruction
To the nearest year. Data Science and EDS specializations were with Roger Peng and Jeff Leek. Data Science Hackathon was with Leah Jager, Jeff Leek, Roger Peng. Guest lectures not included.
```{python}
classes = pd.read_csv("classes.txt", delimiter="|").drop(['Unnamed: 0', ' '], axis = 1)
classes['Start'] = pd.to_datetime("01/01/"+classes['Start'].astype(str))
classes['End'] = pd.to_datetime("12/31/"+classes['End'].astype(str))
classes = classes.sort_values(by = ['Start', 'End'])
fig = px.timeline(classes, x_start="Start", x_end="End", y="Course title", color="Notes",
hover_data = ['Course title', 'Place', 'Notes'],
height=1000)
fig.update_yaxes(autorange="reversed")
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```
##### E-books
E-books are free and open access, excepting *Methods in Biostatistics with R*. For all books, student get all subsequent version updates.
+ *Statistical Inference*, Leanpub
+ *Regression Models*, Leanpub
+ *Developing Data Products*, Leanpub
+ *Advanced Linear Models for Data Science*, Leabpub,
+ *Methods in Biostatistics with R*, Leanpub, with John Muschelli, Ciprian Crainiceanu
+ *Executive Data Science*, Leanpub, with Roger Peng, Jeff Leek
##### Other
+ PI (roll of executive producer, non-instructor) for the BD2K R25 Genomic Data Science Specialization, fMRI 1 and 2 (Lindquist / Wager), Neurohacking in R (Craininceanu, Sweeney, Muschelli), Neuroscience for Neuroimaging (Baker)
+ swirl: Mentored project by Nick Carchedi intiated during his internship
+ Course notes for Biostatistics 140.651-2 listed on the Johns Hopkins Open Courseware project
+ YouTube channel (all educational content) - 14k subscribers, over 400 videos, 6.4k views in past 28 days, ~300 hours of total watch time in the last 28 days
### Research grants
```{python}
pigrants = pd.read_csv("grants.txt", delimiter="|")
pigrants = pigrants.drop([' '], axis = 1)
pigrants = pigrants.assign(Number = np.arange(pigrants.shape[0]))
pigrants['Start'] = pd.to_datetime(pigrants['Start'])
pigrants['Finish'] = pd.to_datetime(pigrants['Finish'])
fig = px.timeline(pigrants, x_start="Start", x_end="Finish", y="Number", color="Organization",
hover_data = ['Role', 'Start', 'Finish', 'Organization', 'Mechanism', 'Title'])
fig.update_yaxes(autorange="reversed")
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```
### Co-investigator and subcontract awards
This is surprisingly hard and likely incomplete. Here's the best I could do for title and mechanism.
```{python}
grants = pd.read_csv("grantsFull.csv")
grants.head()
grants = grants.assign(Number = np.arange(grants.shape[0]))
grants['Start'] = pd.to_datetime(grants['Start'])
grants['End'] = pd.to_datetime(grants['End'])
grants.loc[grants['Mechanism'].isna(), 'Mechanism'] = 'Other'
grants.loc[grants['PI'].isna(), 'PI'] = 'Other'
grants['Title small'] = [i[0 : 70] for i in grants['Title']]
grants.loc[grants['YearlyDC'].isna(), 'YearlyDC'] = "No info"
fig = px.timeline(grants, x_start="Start", x_end="End", y="Title small", color = "Mechanism",
height=1000)
fig.update_yaxes(autorange="reversed")
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```
Here's my most frequent grant PIs.
```{python}
grants['Count'] = 1
fig = px.bar(grants,
x = 'PI',
y = 'Count',
color = 'Title',
hover_data = ['Start', 'End', 'Title', 'PI'])
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.update_layout(showlegend=False)
fig.show()
```
Here's a breakdown of grant mechanisms.
```{python}
mechanism = grants['Mechanism'].value_counts().reset_index()
labels = mechanism['index']
values = mechanism['Mechanism']
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.6)])
fig.update_traces(textinfo='value')
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```
Here's grants by the log base 10 of the yearly direct costs and start time. Note some grants only show subcontract value where as others show the parent grant.
```{python}
YDC = []
for x in grants['YearlyDC']:
if x != "No info":
x = np.log10(float(x.replace("$", "").replace(",", "")))
YDC.append(x)
else :
YDC.append(-1)
grants['Log10 YDC'] = YDC
fig = px.scatter(grants[(grants['Log10 YDC'] > 0)],
y = 'Log10 YDC',
x = 'Start',
color = 'Mechanism',
size = 'Log10 YDC')
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```
### Academic service
Here's my major service roles by year rounded to the nearest year by the major organizational group that it represents. Also, I serve on ad hoc tenure and promotion committees whenever asked (not that often, maybe once every other year or so).
```{python}
service = pd.read_csv("service.txt", delimiter="|").drop(['Unnamed: 0', 'Unnamed: 5'], axis = 1)
service['Start'] = pd.to_datetime("01/01/"+service['Start'].astype(str))
service['End'] = pd.to_datetime("12/31/"+service['End'].astype(str))
service = service.sort_values(by = ['Start', 'End'])
fig = px.timeline(service, x_start="Start", x_end="End", y="Role", color="Group",
hover_data = ['Role', 'Group'],
height = 600)
fig.update_yaxes(autorange="reversed")
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```
### Seminars
Here's a plot of the invited seminars I've logged. The list with presentation files can be found
[here](https://docs.google.com/spreadsheets/d/1mRC6xxZmNj3DnwwvCh_8GpErwhvJNq9gkRB3mQz1JIg/edit?usp=sharing).
```{python}
seminars = pd.read_csv("https://docs.google.com/spreadsheets/d/1mRC6xxZmNj3DnwwvCh_8GpErwhvJNq9gkRB3mQz1JIg/export?format=csv&gid=0")
seminars = seminars.assign(Count = 1)
#seminarYear = seminars['Year'].value_counts().reset_index()
#seminarYear = seminarYear.rename(columns = {"index" : "Year", "Year" : "Count"}).sort_values("Year", ascending =False)
#fig = px.bar(seminarYear, x = "Year", y = "Count")
fig = px.bar(seminars, x = "Year", y = "Count", color = "Talk",
hover_data = ['Year', 'Talk', 'Where'])
fig = fig.update_layout({
'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.update_layout(showlegend=False)
## Static
#Image(fig.to_image(format="png", width=400, height=200, scale=2))
## Interactive
fig.show()
```