cvJupyter.qmd

---
jupyter: python3
format: html
execute:
  echo: false
---

# Brian S Caffo
<hr>
| Professor |
|:---  |
| [Departments of Biostatistics](https://publichealth.jhu.edu/departments/biostatistics) [Johns Hopkins University](www.jhu.edu) (primary),|
| [Department of Biomedical Engineering](https://www.bme.jhu.edu/), [Johns Hopkins University](www.jhu.edu) (courtesy) |
| [www.bcaffo.com](www.bcaffo.com), [CV repo](https://github.com/bcaffo/cv), [CV hosted version](https://bcaffo.github.io/cv/cvJupyter.html) |

# Part I 
## Summary
Brian Caffo, PhD is a professor in the Department of Biostatistics
with a secondary appointment in the Department of Biomedical
Engineering at Johns Hopkins University.  He graduated from the
University of Florida Department of Statistics in 2001. He has worked
in statistical computing, statistical modeling, computational
statistics, multivariate and decomposition methods and statistics in
neuroimaging and neuroscience. He led teams that won the ADHD 200
prediction competition. He co-directs the SMART statistical
group. With other faculty at JHU, he created and co-directs the
Coursera Data Science Specialization, a 10 course specialization on
statistical data analysis. He co-directs the JHU Data Science Lab, a
group dedicated to open educational innovation and data science. He is
the former director of the Biostatistics graduate programs and
admissions committees. He is currently the co-director of the Johns
Hopkins High Performance Computing Exchange super computing service
center and past-president of the Bloomberg School of Public Health
faculty senate.

## Education and training

| Year | Description | Institution | | 
|:---  |:---         | :--- | :--- | 
| 2006 | K25 training grant | NIH | *A mentored training program in imaging science* | 
| 2001 | PhD in statistics | U of Florida | *Candidate sampling schemes and some important applications* | 
| 1998 | MS in statistics|  U of Florida | |
| 1995 | Dual BS in mathematics and statistics | U of Florida | |

```{python}
import pandas as pd
import plotly.express as px
import numpy as np
import wordcloud as wc
import stylecloud as sc
import matplotlib.pyplot as plt
import os
from IPython.display import Image
import plotly.graph_objects as go
import plotly.io as pio
import itertools
from PIL import Image

## pio.renderers.default = "plotly_mimetype+notebook"
## pio.renderers.default = "plotly_mimetype+notebook_connected" 
## This allows for pdf rendering
## pio.renderers.default = "plotly_mimetype+notebook+pdf"
## pio.kaleido.scope.mathjax = None

#static = True
static = False

#the default height and width
height = 400
width = 600

## Note to render to pdf do
## quarto render cvJupyter.qmd --to pdf
pd.set_option("display.max_rows", 999)
#dat = pd.read_csv("publications_01042022_2.csv")
#dat = pd.read_csv("publications_12072022.csv")
dat = pd.read_csv("publications_01152024.csv")

## Not sure why these columns changes. 
## Here's the ones you need, reset these every year
dat = dat.rename(columns = {
    'Authors' : 'Authors',
    'Year' : 'Publication Year',
    'Title'        : 'Document Title',
    'Source title' : 'Journal Title',
    'Cited by'     : 'Citations'
})
dat['Citations'] = dat['Citations'].fillna(0)

```

## Professional experience
Relevant professional experience.

```{python}
#bizarre, plotly works better if you write some random figure out first.
fig=px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])
fig.write_image("temp.pdf", format="pdf")
```

```{python}
profExp = pd.read_csv("profExp.txt", delimiter="|")
profExp['Start'] = pd.to_datetime("01/01/"+profExp['Start'].astype(str))
profExp['End'] = pd.to_datetime("12/31/"+profExp['End'].astype(str))
profExp = profExp.sort_values(by = ['Start', 'End'])
profExp = profExp.assign(Position=profExp['Title']+" "+profExp['Place'])  
    
    
fig = px.timeline(profExp, x_start="Start", x_end="End", 
                  y='Position', 
                  color="Organization",
                  height= 400)                  
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_yaxes(autorange="reversed")

fig.show(warn = False)
```

## Profesional activities

| Year | Activity | 
| :--- | :---     |
| 2005-2006 |Publications Officer for the Biometrics Section of the American Statistical Association |
| 2010 | Founding member Stat in Imaging ASA Section |
| 2010-2011 | Secretary Stat in Imaging ASA Section |


## Editorial activities

| Year | Activity | 
| :--- | :---     |
| 2006-2008 | Associate editor Computational Statistics and Data Analysis |
| 2008-2010 | Associate editor for the Journal of the American Statistical Association |
| 2009-2012 | Associate editor for the Journal of the Royal Statistical Society Series B |
| 2010-2012 | Associate editor for Biometrics
| 2011-2011 | Senior program committee member for the Fourteenth International Conference on Artificial Intelligence and Statistics| 
| 2016-2016 | Guest associate editor for Frontiers in Neuroscience special issues on Brain Imaging Methods
| 2021-2021 | Guest associate editor for Frontiers special issue in Explainable Artificial Intelligence in Healthcare and Finance |

I do NIH, EU, NSF ... ad hoc review panels whenever they ask me and I'm  able to. This usually translates to say 3 or so a year. I review manuscripts for journals I like whenever they're relevant to my research expertise and I have done a few conference abstract reviews and chaired sessions for conferences I like.

## Honors and awards

| Year | Award | 
| :--- | :--- |
| 1998 | William S. Mendenhall Award |
| 1999 | Anderson Scholar/Faculty nominee for the University of  Florida CLAS |
| 2001 | University of Florida CLAS Dissertation Fellowship |
| 2001 | University of Florida Statistics Faculty Award |
| 2002 | Johns Hopkins Faculty Innovation Award |
| 2006 | Johns Hopkins Bloomberg School of Public Health AMTRA award |
| 2008 | Johns Hopkins Bloomberg School of Public Health Golden Apple teaching award |
| 2011 | Leader and organizer of the declared winning entry of the 2011 ADHD200 prediction competition |
| 2011 | Presidential Early Career Award for Scientists and Engineers (PECASE, 2010, awarded in 2011); *The highest honor bestowed by the United States government on science and engineering professionals in the early stages of their independent research careers* |
| 2014 | Named a Fellow of the American Statistical Association |
| 2015 | Special Invited Lecturer, European Meeting of Statisticians |
| 2022 | Adrienne Cupples award; *This annual award recognizes a biostatistician whose academic achievements reflect the contributions to teaching, research, and service exemplified by [Professor L. Adrienne Cupples](https://www.bu.edu/sph/news/articles/2022/l-adrienne-cupples-in-memoriam/)*  

## Publications

Publications reported in Scopus as of 1/15/2024. My total number of Scopus publications is 257. Below is a plot of total publications by year where each small rectangle is a publication. 

```{python}
## Create a temporary copy of the dataset and work with that
temp = dat
temp = temp.assign(Count = 1)


fig = px.bar(temp, x = 'Publication Year', 
                 y = 'Count',
                 color = 'Document Title',
                 hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])


fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig
```


Here are journals I publish in the most. 

```{python}
temp = dat['Journal Title'].value_counts().reset_index()
temp = temp.rename(columns = {"index" : "Title", "Journal Title" : "Count"}).sort_values("Count", ascending =False)

temp['inplot'] = temp['Count'] > 5

temp = temp.merge(dat, left_on = 'Title', 
    right_on = 'Journal Title', how = 'left')
temp = temp[temp['Count'] > 5]
temp = temp.assign(Count = 1)


fig = px.bar(temp, x = 'Journal Title', 
                 y = 'Count',
                 color = 'Document Title',
                 hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()
```

I have published with 793 coauthors (according to Scopus). Here is authors that I have had 7 or over manuscripts with.

```{python}
## Get just the author last names. Have to strip out the initials
text = [s.split(';') for s in dat['Authors']]
text = list(itertools.chain(*text))
authors = pd.DataFrame({'Author' : text}).value_counts().reset_index()
authors = authors.rename(columns = {0 : 'Count'})
authors = authors[authors['Count'] > 7]
authors = authors[~authors['Author'].str.contains('Caffo')]
authors = authors['Author']

## Create a dataframe with just the info we need
authorDF = dat.copy()

## Create a column for every author included
for author in  authors:
    authorDF[author] = authorDF['Authors'].str.contains(author)

## Get rid of rows where no author is included
authorDF = authorDF[authorDF[authors].any(axis=1)]

## Melt the dataframe so that each row is a manuscript and each column is an author
authorDF = authorDF.melt(id_vars = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'], 
                         value_vars = authors, 
                         var_name = 'Author', 
                         value_name = 'Included')


authorDF['Count'] = 1

authorDF = authorDF[authorDF['Included'] == True]

authorDF['Last name'] = [name.split()[0] for name in authorDF['Author']]

authorDF.sort_values(by = ['Last name', 'Publication Year'], inplace = True)

fig = px.bar(authorDF, x = 'Last name', 
                 y = 'Count',
                 color = 'Document Title',
                 hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()


```


Here's a plot of number of authors for each manuscript by my position. 

```{python}
## Get a list of lists of last names
text = [s.split(';') for s in dat['Authors']]
lname= lambda namelist: [name.split()[0] for name in namelist]
text = [lname(x) for x in text]

authorno = [len(x) for x in text]
position = [x.index('Caffo') + 1 for x in text]

positionDF = pd.DataFrame({'# Authors' : authorno, 'Position' : position})


fig = px.scatter(positionDF.groupby(['# Authors', 'Position']).size().reset_index(name = 'Count'),
                 x = '# Authors',
                 y = 'Position',
                 color = 'Count',
                 size = 'Count')
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()
```


Here's the total citation counts of manuscripts plotted by year of publication.

```{python}
temp = dat
temp = temp.rename(columns = {'total' : 'Citations'})
fig = px.bar(temp, 
             x = 'Publication Year', 
             y = 'Citations', 
             color = 'Document Title',
             hover_data = ['Publication Year', 'Citations', 'Document Title', 'Journal Title'])

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()
```

# Part II 
## Teaching
### Advisees
To the nearest year from matriculation year. Includes advisees and co-advisees.

```{python}
advisees = pd.read_csv("Advisees.txt", sep = "|")

advisees['Start'] = pd.to_datetime("01/01/"+advisees['Start'].astype(str))
advisees['End'] = pd.to_datetime("12/31/"+advisees['End'].astype(str))
advisees = advisees.sort_values(by = ['Start', 'End'])
    
    
fig = px.timeline(advisees, 
                  x_start="Start", 
                  x_end="End", 
                  y='Advisee', 
                  color="Degree",
                  height=700,
                  hover_data = ['Title', 'Notes'])

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()
```

### Student exam participation
Excludes alternate.

```{python}
exams = pd.read_csv("exams.csv")
exams['Exam'] = exams['Exam'].str.strip()
exams['Department'] = exams['Department'].str.strip()
exams = exams[ ['Year', 'Department', 'Exam'] ].value_counts().reset_index()
exams = exams.rename(columns = {0 : 'Count'})

fig = px.bar(exams[ exams['Exam'] == "prelim"], x = "Year", y = "Count", color = "Department", title = "Prelim Exams / GBOs")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})


## Static image
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()
```

```{python}
fig = px.bar(exams[ exams['Exam'] == "final"], x = "Year", y = "Count", color = "Department", title = "Final Exams")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

## Static
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()
```

```{python}
fig = px.bar(exams[ exams['Exam'] == "masters"], 
             x = "Year", y = "Count", color = "Department", title = "Masters reader")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

## Static
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()
```

### Classroom Instruction
To the nearest year. Data Science and EDS specializations were with Roger Peng and Jeff Leek. Data Science Hackathon was with Leah Jager, Jeff Leek, Roger Peng. Guest lectures not included.

```{python}
classes = pd.read_csv("classes.txt", delimiter="|").drop(['Unnamed: 0', ' '], axis = 1)
classes['Start'] = pd.to_datetime("01/01/"+classes['Start'].astype(str))
classes['End'] = pd.to_datetime("12/31/"+classes['End'].astype(str))
classes = classes.sort_values(by = ['Start', 'End'])
   
    
fig = px.timeline(classes, x_start="Start", x_end="End", y="Course title", color="Notes",
                 hover_data = ['Course title', 'Place', 'Notes'],
                  height=1000)
fig.update_yaxes(autorange="reversed")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()
```


##### E-books
E-books are free and open access, excepting *Methods in Biostatistics with R*. For all books, student get all subsequent version updates.

+ *Statistical Inference*, Leanpub 
+ *Regression Models*, Leanpub
+ *Developing Data Products*, Leanpub 
+ *Advanced Linear Models for Data Science*, Leabpub,
+ *Methods in Biostatistics with R*, Leanpub, with John Muschelli, Ciprian Crainiceanu
+ *Executive Data Science*, Leanpub, with Roger Peng, Jeff Leek

##### Other
+ PI (roll of executive producer, non-instructor) for the BD2K R25 Genomic Data Science Specialization, fMRI 1 and 2 (Lindquist / Wager), Neurohacking in R (Craininceanu, Sweeney, Muschelli), Neuroscience for Neuroimaging (Baker)
+ swirl: Mentored project by Nick Carchedi intiated during his internship
+ Course notes for Biostatistics 140.651-2 listed on the Johns Hopkins Open Courseware project 
+ YouTube channel (all educational content) - 14k subscribers, over 400 videos, 6.4k views in past 28 days, ~300 hours of total watch time in the last 28 days

### Research grants

```{python}
pigrants = pd.read_csv("grants.txt", delimiter="|")
pigrants = pigrants.drop([' '], axis = 1)
pigrants = pigrants.assign(Number = np.arange(pigrants.shape[0]))
pigrants['Start'] = pd.to_datetime(pigrants['Start'])
pigrants['Finish'] = pd.to_datetime(pigrants['Finish'])

fig = px.timeline(pigrants, x_start="Start", x_end="Finish", y="Number", color="Organization",
                 hover_data = ['Role', 'Start', 'Finish', 'Organization', 'Mechanism', 'Title'])
fig.update_yaxes(autorange="reversed")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()
```

### Co-investigator and subcontract awards
This is surprisingly hard and likely incomplete. Here's the best I could do for title and mechanism.

```{python}
grants = pd.read_csv("grantsFull.csv")
grants.head()
grants = grants.assign(Number = np.arange(grants.shape[0]))
grants['Start'] = pd.to_datetime(grants['Start'])
grants['End'] = pd.to_datetime(grants['End'])
grants.loc[grants['Mechanism'].isna(), 'Mechanism'] = 'Other'
grants.loc[grants['PI'].isna(), 'PI'] = 'Other'
grants['Title small'] = [i[0 : 70] for i in grants['Title']]
grants.loc[grants['YearlyDC'].isna(), 'YearlyDC'] = "No info"

fig = px.timeline(grants, x_start="Start", x_end="End", y="Title small", color = "Mechanism",
                  height=1000)
fig.update_yaxes(autorange="reversed")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()
```

Here's my most frequent grant PIs.

```{python}

grants['Count'] = 1

fig = px.bar(grants, 
             x = 'PI', 
             y = 'Count', 
             color = 'Title',
             hover_data = ['Start', 'End', 'Title', 'PI'])

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()
```

Here's a breakdown of grant mechanisms.

```{python}
mechanism = grants['Mechanism'].value_counts().reset_index()


labels = mechanism['index']
values = mechanism['Mechanism']

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.6)])

fig.update_traces(textinfo='value')

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()
```

Here's grants by the log base 10 of the yearly direct costs and start time. Note some grants only show subcontract value where as others show the parent grant.

```{python}
YDC = []
for x in grants['YearlyDC']:
    if x != "No info":
        x = np.log10(float(x.replace("$", "").replace(",", "")))
        YDC.append(x)
    else :
        YDC.append(-1)
grants['Log10 YDC'] = YDC

fig = px.scatter(grants[(grants['Log10 YDC'] > 0)],
                 y = 'Log10 YDC',
                 x = 'Start',
                 color = 'Mechanism',
                 size = 'Log10 YDC')
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()
```

### Academic service 
Here's my major service roles by year rounded to the nearest year by the major organizational group that it represents. Also, I serve on ad hoc tenure and promotion committees whenever asked (not that often, maybe once every other year or so).

```{python}
service = pd.read_csv("service.txt", delimiter="|").drop(['Unnamed: 0', 'Unnamed: 5'], axis = 1)
service['Start'] = pd.to_datetime("01/01/"+service['Start'].astype(str))
service['End'] = pd.to_datetime("12/31/"+service['End'].astype(str))
service = service.sort_values(by = ['Start', 'End'])

    
fig = px.timeline(service, x_start="Start", x_end="End", y="Role", color="Group",
                 hover_data = ['Role', 'Group'],
                 height = 600)
fig.update_yaxes(autorange="reversed")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()
```

### Seminars
Here's a plot of the invited seminars I've logged. The list with presentation files can be found 
[here](https://docs.google.com/spreadsheets/d/1mRC6xxZmNj3DnwwvCh_8GpErwhvJNq9gkRB3mQz1JIg/edit?usp=sharing).

```{python}
seminars = pd.read_csv("https://docs.google.com/spreadsheets/d/1mRC6xxZmNj3DnwwvCh_8GpErwhvJNq9gkRB3mQz1JIg/export?format=csv&gid=0")
seminars = seminars.assign(Count = 1)
#seminarYear = seminars['Year'].value_counts().reset_index()
#seminarYear = seminarYear.rename(columns = {"index" : "Year", "Year" : "Count"}).sort_values("Year", ascending =False)
#fig = px.bar(seminarYear, x = "Year", y = "Count")
fig = px.bar(seminars, x = "Year", y = "Count", color = "Talk",
             hover_data = ['Year', 'Talk', 'Where'])
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

## Static
#Image(fig.to_image(format="png", width=400, height=200, scale=2))
## Interactive
fig.show()
```