forked from hemberg-lab/scRNA.seq.course
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcomparingcombining-scrnaseq-datasets.html
1410 lines (1346 loc) · 129 KB
/
comparingcombining-scrnaseq-datasets.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html >
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Analysis of single cell RNA-seq data</title>
<meta name="description" content="Analysis of single cell RNA-seq data">
<meta name="generator" content="bookdown 0.6 and GitBook 2.6.7">
<meta property="og:title" content="Analysis of single cell RNA-seq data" />
<meta property="og:type" content="book" />
<meta name="twitter:card" content="summary" />
<meta name="twitter:title" content="Analysis of single cell RNA-seq data" />
<meta name="author" content="Vladimir Kiselev (wikiselev), Tallulah Andrews, Jennifer Westoby (Jenni_Westoby), Davis McCarthy (davisjmcc), Maren Büttner (marenbuettner) and Martin Hemberg (m_hemberg)">
<meta name="date" content="2018-02-03">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<link rel="prev" href="biological-analysis.html">
<link rel="next" href="seurat-chapter.html">
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<!-- for Facebook -->
<meta property="og:url" content="http://hemberg-lab.github.io/scRNA.seq.course/" />
<meta property="og:description" content="In this course we will be surveying the existing problems as well as the available computational and statistical frameworks available for the analysis of scRNA-seq. The course is taught through the University of Cambridge Bioinformatics training unit, but the material found on these pages is meant to be used for anyone interested in learning about computational analysis of scRNA-seq data." />
<meta property="og:image" content="http://hemberg-lab.github.io/scRNA.seq.course/figures/RNA-Seq_workflow-5.pdf.jpg" />
<!-- for Twitter -->
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="Analysis of single-cell RNA-seq data" />
<meta name="twitter:description" content="In this course we will be surveying the existing problems as well as the available computational and statistical frameworks available for the analysis of scRNA-seq. The course is taught through the University of Cambridge Bioinformatics training unit, but the material found on these pages is meant to be used for anyone interested in learning about computational analysis of scRNA-seq data." />
<meta name="twitter:image" content="http://hemberg-lab.github.io/scRNA.seq.course/figures/RNA-Seq_workflow-5.pdf.jpg" />
<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-71525309-1', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<link rel="stylesheet" href="style.css" type="text/css" />
</head>
<body>
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
<div class="book-summary">
<nav role="navigation">
<ul class="summary">
<li><a href="index.html">Table of Contents</a></li>
<li class="divider"></li>
<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> About the course</a><ul>
<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#video"><i class="fa fa-check"></i><b>1.1</b> Video</a></li>
<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#registration"><i class="fa fa-check"></i><b>1.2</b> Registration</a></li>
<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#github"><i class="fa fa-check"></i><b>1.3</b> GitHub</a></li>
<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#docker-image-rstudio"><i class="fa fa-check"></i><b>1.4</b> Docker image (RStudio)</a></li>
<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#manual-installation"><i class="fa fa-check"></i><b>1.5</b> Manual installation</a></li>
<li class="chapter" data-level="1.6" data-path="index.html"><a href="index.html#license"><i class="fa fa-check"></i><b>1.6</b> License</a></li>
<li class="chapter" data-level="1.7" data-path="index.html"><a href="index.html#prerequisites"><i class="fa fa-check"></i><b>1.7</b> Prerequisites</a></li>
<li class="chapter" data-level="1.8" data-path="index.html"><a href="index.html#contact"><i class="fa fa-check"></i><b>1.8</b> Contact</a></li>
</ul></li>
<li class="chapter" data-level="2" data-path="introduction-to-single-cell-rna-seq.html"><a href="introduction-to-single-cell-rna-seq.html"><i class="fa fa-check"></i><b>2</b> Introduction to single-cell RNA-seq</a><ul>
<li class="chapter" data-level="2.1" data-path="introduction-to-single-cell-rna-seq.html"><a href="introduction-to-single-cell-rna-seq.html#bulk-rna-seq"><i class="fa fa-check"></i><b>2.1</b> Bulk RNA-seq</a></li>
<li class="chapter" data-level="2.2" data-path="introduction-to-single-cell-rna-seq.html"><a href="introduction-to-single-cell-rna-seq.html#scrna-seq"><i class="fa fa-check"></i><b>2.2</b> scRNA-seq</a></li>
<li class="chapter" data-level="2.3" data-path="introduction-to-single-cell-rna-seq.html"><a href="introduction-to-single-cell-rna-seq.html#workflow"><i class="fa fa-check"></i><b>2.3</b> Workflow</a></li>
<li class="chapter" data-level="2.4" data-path="introduction-to-single-cell-rna-seq.html"><a href="introduction-to-single-cell-rna-seq.html#computational-analysis"><i class="fa fa-check"></i><b>2.4</b> Computational Analysis</a></li>
<li class="chapter" data-level="2.5" data-path="introduction-to-single-cell-rna-seq.html"><a href="introduction-to-single-cell-rna-seq.html#challenges"><i class="fa fa-check"></i><b>2.5</b> Challenges</a></li>
<li class="chapter" data-level="2.6" data-path="introduction-to-single-cell-rna-seq.html"><a href="introduction-to-single-cell-rna-seq.html#experimental-methods"><i class="fa fa-check"></i><b>2.6</b> Experimental methods</a></li>
<li class="chapter" data-level="2.7" data-path="introduction-to-single-cell-rna-seq.html"><a href="introduction-to-single-cell-rna-seq.html#what-platform-to-use-for-my-experiment"><i class="fa fa-check"></i><b>2.7</b> What platform to use for my experiment?</a></li>
</ul></li>
<li class="chapter" data-level="3" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html"><i class="fa fa-check"></i><b>3</b> Processing Raw scRNA-seq Data</a><ul>
<li class="chapter" data-level="3.1" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#fastqc"><i class="fa fa-check"></i><b>3.1</b> FastQC</a><ul>
<li class="chapter" data-level="3.1.1" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#solution-and-downloading-the-report"><i class="fa fa-check"></i><b>3.1.1</b> Solution and Downloading the Report</a></li>
</ul></li>
<li class="chapter" data-level="3.2" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#trimming-reads"><i class="fa fa-check"></i><b>3.2</b> Trimming Reads</a><ul>
<li class="chapter" data-level="3.2.1" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#solution"><i class="fa fa-check"></i><b>3.2.1</b> Solution</a></li>
</ul></li>
<li class="chapter" data-level="3.3" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#file-formats"><i class="fa fa-check"></i><b>3.3</b> File formats</a><ul>
<li class="chapter" data-level="3.3.1" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#fastq"><i class="fa fa-check"></i><b>3.3.1</b> FastQ</a></li>
<li class="chapter" data-level="3.3.2" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#bam"><i class="fa fa-check"></i><b>3.3.2</b> BAM</a></li>
<li class="chapter" data-level="3.3.3" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#cram"><i class="fa fa-check"></i><b>3.3.3</b> CRAM</a></li>
<li class="chapter" data-level="3.3.4" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#mannually-inspecting-files"><i class="fa fa-check"></i><b>3.3.4</b> Mannually Inspecting files</a></li>
<li class="chapter" data-level="3.3.5" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#genome-fasta-gtf"><i class="fa fa-check"></i><b>3.3.5</b> Genome (FASTA, GTF)</a></li>
</ul></li>
<li class="chapter" data-level="3.4" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#demultiplexing"><i class="fa fa-check"></i><b>3.4</b> Demultiplexing</a><ul>
<li class="chapter" data-level="3.4.1" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#identifying-cell-containing-dropletsmicrowells"><i class="fa fa-check"></i><b>3.4.1</b> Identifying cell-containing droplets/microwells</a></li>
</ul></li>
<li class="chapter" data-level="3.5" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#using-star-to-align-reads"><i class="fa fa-check"></i><b>3.5</b> Using STAR to Align Reads</a><ul>
<li class="chapter" data-level="3.5.1" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#solution-for-star-alignment"><i class="fa fa-check"></i><b>3.5.1</b> Solution for STAR Alignment</a></li>
</ul></li>
<li class="chapter" data-level="3.6" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#kallisto-and-pseudo-alignment"><i class="fa fa-check"></i><b>3.6</b> Kallisto and Pseudo-Alignment</a><ul>
<li class="chapter" data-level="3.6.1" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#what-is-a-k-mer"><i class="fa fa-check"></i><b>3.6.1</b> What is a k-mer?</a></li>
<li class="chapter" data-level="3.6.2" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#why-map-k-mers-rather-than-reads"><i class="fa fa-check"></i><b>3.6.2</b> Why map k-mers rather than reads?</a></li>
<li class="chapter" data-level="3.6.3" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#kallistos-pseudo-mode"><i class="fa fa-check"></i><b>3.6.3</b> Kallisto’s pseudo mode</a></li>
<li class="chapter" data-level="3.6.4" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#solution-to-kallisto-pseudo-alignment"><i class="fa fa-check"></i><b>3.6.4</b> Solution to Kallisto Pseudo-Alignment</a></li>
<li class="chapter" data-level="3.6.5" data-path="processing-raw-scrna-seq-data.html"><a href="processing-raw-scrna-seq-data.html#understanding-the-output-of-kallisto-pseudo-alignment"><i class="fa fa-check"></i><b>3.6.5</b> Understanding the Output of Kallisto Pseudo-Alignment</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="4" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html"><i class="fa fa-check"></i><b>4</b> Construction of expression matrix</a><ul>
<li class="chapter" data-level="4.1" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#reads-qc"><i class="fa fa-check"></i><b>4.1</b> Reads QC</a></li>
<li class="chapter" data-level="4.2" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#reads-alignment"><i class="fa fa-check"></i><b>4.2</b> Reads alignment</a></li>
<li class="chapter" data-level="4.3" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#alignment-example"><i class="fa fa-check"></i><b>4.3</b> Alignment example</a></li>
<li class="chapter" data-level="4.4" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#mapping-qc"><i class="fa fa-check"></i><b>4.4</b> Mapping QC</a></li>
<li class="chapter" data-level="4.5" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#reads-quantification"><i class="fa fa-check"></i><b>4.5</b> Reads quantification</a></li>
<li class="chapter" data-level="4.6" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#umichapter"><i class="fa fa-check"></i><b>4.6</b> Unique Molecular Identifiers (UMIs)</a><ul>
<li class="chapter" data-level="4.6.1" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#introduction"><i class="fa fa-check"></i><b>4.6.1</b> Introduction</a></li>
<li class="chapter" data-level="4.6.2" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#mapping-barcodes"><i class="fa fa-check"></i><b>4.6.2</b> Mapping Barcodes</a></li>
<li class="chapter" data-level="4.6.3" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#counting-barcodes"><i class="fa fa-check"></i><b>4.6.3</b> Counting Barcodes</a></li>
<li class="chapter" data-level="4.6.4" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#correcting-for-errors"><i class="fa fa-check"></i><b>4.6.4</b> Correcting for Errors</a></li>
<li class="chapter" data-level="4.6.5" data-path="construction-of-expression-matrix.html"><a href="construction-of-expression-matrix.html#downstream-analysis"><i class="fa fa-check"></i><b>4.6.5</b> Downstream Analysis</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="5" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html"><i class="fa fa-check"></i><b>5</b> Introduction to R/Bioconductor</a><ul>
<li class="chapter" data-level="5.1" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#installing-packages"><i class="fa fa-check"></i><b>5.1</b> Installing packages</a><ul>
<li class="chapter" data-level="5.1.1" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#cran"><i class="fa fa-check"></i><b>5.1.1</b> CRAN</a></li>
<li class="chapter" data-level="5.1.2" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#github-1"><i class="fa fa-check"></i><b>5.1.2</b> Github</a></li>
<li class="chapter" data-level="5.1.3" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#bioconductor"><i class="fa fa-check"></i><b>5.1.3</b> Bioconductor</a></li>
<li class="chapter" data-level="5.1.4" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#source"><i class="fa fa-check"></i><b>5.1.4</b> Source</a></li>
</ul></li>
<li class="chapter" data-level="5.2" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#installation-instructions"><i class="fa fa-check"></i><b>5.2</b> Installation instructions:</a></li>
<li class="chapter" data-level="5.3" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#data-typesclasses"><i class="fa fa-check"></i><b>5.3</b> Data-types/classes</a><ul>
<li class="chapter" data-level="5.3.1" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#numeric"><i class="fa fa-check"></i><b>5.3.1</b> Numeric</a></li>
<li class="chapter" data-level="5.3.2" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#characterstring"><i class="fa fa-check"></i><b>5.3.2</b> Character/String</a></li>
<li class="chapter" data-level="5.3.3" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#logical"><i class="fa fa-check"></i><b>5.3.3</b> Logical</a></li>
<li class="chapter" data-level="5.3.4" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#factors"><i class="fa fa-check"></i><b>5.3.4</b> Factors</a></li>
<li class="chapter" data-level="5.3.5" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#checking-classtype"><i class="fa fa-check"></i><b>5.3.5</b> Checking class/type</a></li>
</ul></li>
<li class="chapter" data-level="5.4" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#basic-data-structures"><i class="fa fa-check"></i><b>5.4</b> Basic data structures</a></li>
<li class="chapter" data-level="5.5" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#more-information"><i class="fa fa-check"></i><b>5.5</b> More information</a></li>
<li class="chapter" data-level="5.6" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#data-types"><i class="fa fa-check"></i><b>5.6</b> Data Types</a><ul>
<li class="chapter" data-level="5.6.1" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#what-is-tidy-data"><i class="fa fa-check"></i><b>5.6.1</b> What is Tidy Data?</a></li>
<li class="chapter" data-level="5.6.2" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#what-is-rich-data"><i class="fa fa-check"></i><b>5.6.2</b> What is Rich Data?</a></li>
<li class="chapter" data-level="5.6.3" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#what-is-bioconductor"><i class="fa fa-check"></i><b>5.6.3</b> What is Bioconductor?</a></li>
<li class="chapter" data-level="5.6.4" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#singlecellexperiment-class"><i class="fa fa-check"></i><b>5.6.4</b> <code>SingleCellExperiment</code> class</a></li>
<li class="chapter" data-level="5.6.5" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#scater-package"><i class="fa fa-check"></i><b>5.6.5</b> <code>scater</code> package</a></li>
</ul></li>
<li class="chapter" data-level="5.7" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#bioconductor-singlecellexperiment-and-scater"><i class="fa fa-check"></i><b>5.7</b> Bioconductor, <code>SingleCellExperiment</code> and <code>scater</code></a><ul>
<li class="chapter" data-level="5.7.1" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#bioconductor-1"><i class="fa fa-check"></i><b>5.7.1</b> Bioconductor</a></li>
<li class="chapter" data-level="5.7.2" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#singlecellexperiment-class-1"><i class="fa fa-check"></i><b>5.7.2</b> <code>SingleCellExperiment</code> class</a></li>
<li class="chapter" data-level="5.7.3" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#scater-package-1"><i class="fa fa-check"></i><b>5.7.3</b> <code>scater</code> package</a></li>
</ul></li>
<li class="chapter" data-level="5.8" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#an-introduction-to-ggplot2"><i class="fa fa-check"></i><b>5.8</b> An Introduction to ggplot2</a><ul>
<li class="chapter" data-level="5.8.1" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#what-is-ggplot2"><i class="fa fa-check"></i><b>5.8.1</b> What is ggplot2?</a></li>
<li class="chapter" data-level="5.8.2" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#principles-of-ggplot2"><i class="fa fa-check"></i><b>5.8.2</b> Principles of ggplot2</a></li>
<li class="chapter" data-level="5.8.3" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#using-the-aes-mapping-function"><i class="fa fa-check"></i><b>5.8.3</b> Using the <code>aes</code> mapping function</a></li>
<li class="chapter" data-level="5.8.4" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#geoms"><i class="fa fa-check"></i><b>5.8.4</b> Geoms</a></li>
<li class="chapter" data-level="5.8.5" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#plotting-data-from-more-than-2-cells"><i class="fa fa-check"></i><b>5.8.5</b> Plotting data from more than 2 cells</a></li>
<li class="chapter" data-level="5.8.6" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#plotting-heatmaps"><i class="fa fa-check"></i><b>5.8.6</b> Plotting heatmaps</a></li>
<li class="chapter" data-level="5.8.7" data-path="introduction-to-rbioconductor.html"><a href="introduction-to-rbioconductor.html#principle-component-analysis"><i class="fa fa-check"></i><b>5.8.7</b> Principle Component Analysis</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="6" data-path="tabula-muris.html"><a href="tabula-muris.html"><i class="fa fa-check"></i><b>6</b> Tabula Muris</a><ul>
<li class="chapter" data-level="6.1" data-path="tabula-muris.html"><a href="tabula-muris.html#introduction-1"><i class="fa fa-check"></i><b>6.1</b> Introduction</a></li>
<li class="chapter" data-level="6.2" data-path="tabula-muris.html"><a href="tabula-muris.html#downloading-the-data"><i class="fa fa-check"></i><b>6.2</b> Downloading the data</a></li>
<li class="chapter" data-level="6.3" data-path="tabula-muris.html"><a href="tabula-muris.html#reading-the-data-smartseq2"><i class="fa fa-check"></i><b>6.3</b> Reading the data (Smartseq2)</a></li>
<li class="chapter" data-level="6.4" data-path="tabula-muris.html"><a href="tabula-muris.html#building-a-scater-object"><i class="fa fa-check"></i><b>6.4</b> Building a scater object</a></li>
<li class="chapter" data-level="6.5" data-path="tabula-muris.html"><a href="tabula-muris.html#reading-the-data-10x"><i class="fa fa-check"></i><b>6.5</b> Reading the data (10X)</a></li>
<li class="chapter" data-level="6.6" data-path="tabula-muris.html"><a href="tabula-muris.html#building-a-scater-object-1"><i class="fa fa-check"></i><b>6.6</b> Building a scater object</a></li>
<li class="chapter" data-level="6.7" data-path="tabula-muris.html"><a href="tabula-muris.html#advanced-exercise"><i class="fa fa-check"></i><b>6.7</b> Advanced Exercise</a></li>
</ul></li>
<li class="chapter" data-level="7" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html"><i class="fa fa-check"></i><b>7</b> Cleaning the Expression Matrix</a><ul>
<li class="chapter" data-level="7.1" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#exprs-qc"><i class="fa fa-check"></i><b>7.1</b> Expression QC (UMI)</a><ul>
<li class="chapter" data-level="7.1.1" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#introduction-2"><i class="fa fa-check"></i><b>7.1.1</b> Introduction</a></li>
<li class="chapter" data-level="7.1.2" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#tung-dataset"><i class="fa fa-check"></i><b>7.1.2</b> Tung dataset</a></li>
<li class="chapter" data-level="7.1.3" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#cell-qc"><i class="fa fa-check"></i><b>7.1.3</b> Cell QC</a></li>
<li class="chapter" data-level="7.1.4" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#cell-filtering"><i class="fa fa-check"></i><b>7.1.4</b> Cell filtering</a></li>
<li class="chapter" data-level="7.1.5" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#compare-filterings"><i class="fa fa-check"></i><b>7.1.5</b> Compare filterings</a></li>
<li class="chapter" data-level="7.1.6" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#gene-analysis"><i class="fa fa-check"></i><b>7.1.6</b> Gene analysis</a></li>
<li class="chapter" data-level="7.1.7" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#save-the-data"><i class="fa fa-check"></i><b>7.1.7</b> Save the data</a></li>
<li class="chapter" data-level="7.1.8" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#big-exercise"><i class="fa fa-check"></i><b>7.1.8</b> Big Exercise</a></li>
<li class="chapter" data-level="7.1.9" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#sessioninfo"><i class="fa fa-check"></i><b>7.1.9</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="7.2" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#expression-qc-reads"><i class="fa fa-check"></i><b>7.2</b> Expression QC (Reads)</a></li>
<li class="chapter" data-level="7.3" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#data-visualization"><i class="fa fa-check"></i><b>7.3</b> Data visualization</a><ul>
<li class="chapter" data-level="7.3.1" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#introduction-3"><i class="fa fa-check"></i><b>7.3.1</b> Introduction</a></li>
<li class="chapter" data-level="7.3.2" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#visual-pca"><i class="fa fa-check"></i><b>7.3.2</b> PCA plot</a></li>
<li class="chapter" data-level="7.3.3" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#visual-tsne"><i class="fa fa-check"></i><b>7.3.3</b> tSNE map</a></li>
<li class="chapter" data-level="7.3.4" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#big-exercise-1"><i class="fa fa-check"></i><b>7.3.4</b> Big Exercise</a></li>
<li class="chapter" data-level="7.3.5" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#sessioninfo-1"><i class="fa fa-check"></i><b>7.3.5</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="7.4" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#data-visualization-reads"><i class="fa fa-check"></i><b>7.4</b> Data visualization (Reads)</a></li>
<li class="chapter" data-level="7.5" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#identifying-confounding-factors"><i class="fa fa-check"></i><b>7.5</b> Identifying confounding factors</a><ul>
<li class="chapter" data-level="7.5.1" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#introduction-4"><i class="fa fa-check"></i><b>7.5.1</b> Introduction</a></li>
<li class="chapter" data-level="7.5.2" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#correlations-with-pcs"><i class="fa fa-check"></i><b>7.5.2</b> Correlations with PCs</a></li>
<li class="chapter" data-level="7.5.3" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#explanatory-variables"><i class="fa fa-check"></i><b>7.5.3</b> Explanatory variables</a></li>
<li class="chapter" data-level="7.5.4" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#other-confounders"><i class="fa fa-check"></i><b>7.5.4</b> Other confounders</a></li>
<li class="chapter" data-level="7.5.5" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#exercise"><i class="fa fa-check"></i><b>7.5.5</b> Exercise</a></li>
<li class="chapter" data-level="7.5.6" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#sessioninfo-2"><i class="fa fa-check"></i><b>7.5.6</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="7.6" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#identifying-confounding-factors-reads"><i class="fa fa-check"></i><b>7.6</b> Identifying confounding factors (Reads)</a></li>
<li class="chapter" data-level="7.7" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#normalization-theory"><i class="fa fa-check"></i><b>7.7</b> Normalization theory</a><ul>
<li class="chapter" data-level="7.7.1" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#introduction-5"><i class="fa fa-check"></i><b>7.7.1</b> Introduction</a></li>
<li class="chapter" data-level="7.7.2" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#library-size-1"><i class="fa fa-check"></i><b>7.7.2</b> Library size</a></li>
<li class="chapter" data-level="7.7.3" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#normalisations"><i class="fa fa-check"></i><b>7.7.3</b> Normalisations</a></li>
<li class="chapter" data-level="7.7.4" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#effectiveness"><i class="fa fa-check"></i><b>7.7.4</b> Effectiveness</a></li>
</ul></li>
<li class="chapter" data-level="7.8" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#normalization-practice-umi"><i class="fa fa-check"></i><b>7.8</b> Normalization practice (UMI)</a><ul>
<li class="chapter" data-level="7.8.1" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#raw"><i class="fa fa-check"></i><b>7.8.1</b> Raw</a></li>
<li class="chapter" data-level="7.8.2" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#cpm-1"><i class="fa fa-check"></i><b>7.8.2</b> CPM</a></li>
<li class="chapter" data-level="7.8.3" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#size-factor-rle"><i class="fa fa-check"></i><b>7.8.3</b> Size-factor (RLE)</a></li>
<li class="chapter" data-level="7.8.4" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#upperquantile"><i class="fa fa-check"></i><b>7.8.4</b> Upperquantile</a></li>
<li class="chapter" data-level="7.8.5" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#tmm-1"><i class="fa fa-check"></i><b>7.8.5</b> TMM</a></li>
<li class="chapter" data-level="7.8.6" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#scran-1"><i class="fa fa-check"></i><b>7.8.6</b> scran</a></li>
<li class="chapter" data-level="7.8.7" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#downsampling-1"><i class="fa fa-check"></i><b>7.8.7</b> Downsampling</a></li>
<li class="chapter" data-level="7.8.8" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#normalisation-for-genetranscript-length"><i class="fa fa-check"></i><b>7.8.8</b> Normalisation for gene/transcript length</a></li>
<li class="chapter" data-level="7.8.9" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#exercise-1"><i class="fa fa-check"></i><b>7.8.9</b> Exercise</a></li>
<li class="chapter" data-level="7.8.10" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#sessioninfo-3"><i class="fa fa-check"></i><b>7.8.10</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="7.9" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#normalization-practice-reads"><i class="fa fa-check"></i><b>7.9</b> Normalization practice (Reads)</a></li>
<li class="chapter" data-level="7.10" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#dealing-with-confounders"><i class="fa fa-check"></i><b>7.10</b> Dealing with confounders</a><ul>
<li class="chapter" data-level="7.10.1" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#introduction-6"><i class="fa fa-check"></i><b>7.10.1</b> Introduction</a></li>
<li class="chapter" data-level="7.10.2" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#remove-unwanted-variation"><i class="fa fa-check"></i><b>7.10.2</b> Remove Unwanted Variation</a></li>
<li class="chapter" data-level="7.10.3" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#combat"><i class="fa fa-check"></i><b>7.10.3</b> Combat</a></li>
<li class="chapter" data-level="7.10.4" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#mnncorrect"><i class="fa fa-check"></i><b>7.10.4</b> mnnCorrect</a></li>
<li class="chapter" data-level="7.10.5" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#glm"><i class="fa fa-check"></i><b>7.10.5</b> GLM</a></li>
<li class="chapter" data-level="7.10.6" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#how-to-evaluate-and-compare-confounder-removal-strategies"><i class="fa fa-check"></i><b>7.10.6</b> How to evaluate and compare confounder removal strategies</a></li>
<li class="chapter" data-level="7.10.7" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#big-exercise-2"><i class="fa fa-check"></i><b>7.10.7</b> Big Exercise</a></li>
<li class="chapter" data-level="7.10.8" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#sessioninfo-4"><i class="fa fa-check"></i><b>7.10.8</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="7.11" data-path="cleaning-the-expression-matrix.html"><a href="cleaning-the-expression-matrix.html#dealing-with-confounders-reads"><i class="fa fa-check"></i><b>7.11</b> Dealing with confounders (Reads)</a></li>
</ul></li>
<li class="chapter" data-level="8" data-path="biological-analysis.html"><a href="biological-analysis.html"><i class="fa fa-check"></i><b>8</b> Biological Analysis</a><ul>
<li class="chapter" data-level="8.1" data-path="biological-analysis.html"><a href="biological-analysis.html#clustering-introduction"><i class="fa fa-check"></i><b>8.1</b> Clustering Introduction</a><ul>
<li class="chapter" data-level="8.1.1" data-path="biological-analysis.html"><a href="biological-analysis.html#introduction-7"><i class="fa fa-check"></i><b>8.1.1</b> Introduction</a></li>
<li class="chapter" data-level="8.1.2" data-path="biological-analysis.html"><a href="biological-analysis.html#dimensionality-reductions"><i class="fa fa-check"></i><b>8.1.2</b> Dimensionality reductions</a></li>
<li class="chapter" data-level="8.1.3" data-path="biological-analysis.html"><a href="biological-analysis.html#clustering-methods"><i class="fa fa-check"></i><b>8.1.3</b> Clustering methods</a></li>
<li class="chapter" data-level="8.1.4" data-path="biological-analysis.html"><a href="biological-analysis.html#challenges-in-clustering"><i class="fa fa-check"></i><b>8.1.4</b> Challenges in clustering</a></li>
<li class="chapter" data-level="8.1.5" data-path="biological-analysis.html"><a href="biological-analysis.html#tools-for-scrna-seq-data"><i class="fa fa-check"></i><b>8.1.5</b> Tools for scRNA-seq data</a></li>
<li class="chapter" data-level="8.1.6" data-path="biological-analysis.html"><a href="biological-analysis.html#comparing-clustering"><i class="fa fa-check"></i><b>8.1.6</b> Comparing clustering</a></li>
</ul></li>
<li class="chapter" data-level="8.2" data-path="biological-analysis.html"><a href="biological-analysis.html#clust-methods"><i class="fa fa-check"></i><b>8.2</b> Clustering example</a><ul>
<li class="chapter" data-level="8.2.1" data-path="biological-analysis.html"><a href="biological-analysis.html#deng-dataset"><i class="fa fa-check"></i><b>8.2.1</b> Deng dataset</a></li>
<li class="chapter" data-level="8.2.2" data-path="biological-analysis.html"><a href="biological-analysis.html#sc3-1"><i class="fa fa-check"></i><b>8.2.2</b> SC3</a></li>
<li class="chapter" data-level="8.2.3" data-path="biological-analysis.html"><a href="biological-analysis.html#pcareduce-1"><i class="fa fa-check"></i><b>8.2.3</b> pcaReduce</a></li>
<li class="chapter" data-level="8.2.4" data-path="biological-analysis.html"><a href="biological-analysis.html#tsne-kmeans"><i class="fa fa-check"></i><b>8.2.4</b> tSNE + kmeans</a></li>
<li class="chapter" data-level="8.2.5" data-path="biological-analysis.html"><a href="biological-analysis.html#snn-cliq-1"><i class="fa fa-check"></i><b>8.2.5</b> SNN-Cliq</a></li>
<li class="chapter" data-level="8.2.6" data-path="biological-analysis.html"><a href="biological-analysis.html#sincera-1"><i class="fa fa-check"></i><b>8.2.6</b> SINCERA</a></li>
<li class="chapter" data-level="8.2.7" data-path="biological-analysis.html"><a href="biological-analysis.html#sessioninfo-5"><i class="fa fa-check"></i><b>8.2.7</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="8.3" data-path="biological-analysis.html"><a href="biological-analysis.html#feature-selection"><i class="fa fa-check"></i><b>8.3</b> Feature Selection</a><ul>
<li class="chapter" data-level="8.3.1" data-path="biological-analysis.html"><a href="biological-analysis.html#identifying-genes-vs-a-null-model"><i class="fa fa-check"></i><b>8.3.1</b> Identifying Genes vs a Null Model</a></li>
<li class="chapter" data-level="8.3.2" data-path="biological-analysis.html"><a href="biological-analysis.html#correlated-expression"><i class="fa fa-check"></i><b>8.3.2</b> Correlated Expression</a></li>
<li class="chapter" data-level="8.3.3" data-path="biological-analysis.html"><a href="biological-analysis.html#comparing-methods"><i class="fa fa-check"></i><b>8.3.3</b> Comparing Methods</a></li>
<li class="chapter" data-level="8.3.4" data-path="biological-analysis.html"><a href="biological-analysis.html#sessioninfo-6"><i class="fa fa-check"></i><b>8.3.4</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="8.4" data-path="biological-analysis.html"><a href="biological-analysis.html#pseudotime-analysis"><i class="fa fa-check"></i><b>8.4</b> Pseudotime analysis</a><ul>
<li class="chapter" data-level="8.4.1" data-path="biological-analysis.html"><a href="biological-analysis.html#tscan"><i class="fa fa-check"></i><b>8.4.1</b> TSCAN</a></li>
<li class="chapter" data-level="8.4.2" data-path="biological-analysis.html"><a href="biological-analysis.html#monocle"><i class="fa fa-check"></i><b>8.4.2</b> monocle</a></li>
<li class="chapter" data-level="8.4.3" data-path="biological-analysis.html"><a href="biological-analysis.html#diffusion-maps"><i class="fa fa-check"></i><b>8.4.3</b> Diffusion maps</a></li>
<li class="chapter" data-level="8.4.4" data-path="biological-analysis.html"><a href="biological-analysis.html#slicer"><i class="fa fa-check"></i><b>8.4.4</b> SLICER</a></li>
<li class="chapter" data-level="8.4.5" data-path="biological-analysis.html"><a href="biological-analysis.html#comparison-of-the-methods"><i class="fa fa-check"></i><b>8.4.5</b> Comparison of the methods</a></li>
<li class="chapter" data-level="8.4.6" data-path="biological-analysis.html"><a href="biological-analysis.html#expression-of-genes-through-time"><i class="fa fa-check"></i><b>8.4.6</b> Expression of genes through time</a></li>
<li class="chapter" data-level="8.4.7" data-path="biological-analysis.html"><a href="biological-analysis.html#sessioninfo-7"><i class="fa fa-check"></i><b>8.4.7</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="8.5" data-path="biological-analysis.html"><a href="biological-analysis.html#imputation"><i class="fa fa-check"></i><b>8.5</b> Imputation</a><ul>
<li class="chapter" data-level="8.5.1" data-path="biological-analysis.html"><a href="biological-analysis.html#scimpute"><i class="fa fa-check"></i><b>8.5.1</b> scImpute</a></li>
<li class="chapter" data-level="8.5.2" data-path="biological-analysis.html"><a href="biological-analysis.html#magic"><i class="fa fa-check"></i><b>8.5.2</b> MAGIC</a></li>
<li class="chapter" data-level="8.5.3" data-path="biological-analysis.html"><a href="biological-analysis.html#sessioninfo-8"><i class="fa fa-check"></i><b>8.5.3</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="8.6" data-path="biological-analysis.html"><a href="biological-analysis.html#dechapter"><i class="fa fa-check"></i><b>8.6</b> Differential Expression (DE) analysis</a><ul>
<li class="chapter" data-level="8.6.1" data-path="biological-analysis.html"><a href="biological-analysis.html#bulk-rna-seq-1"><i class="fa fa-check"></i><b>8.6.1</b> Bulk RNA-seq</a></li>
<li class="chapter" data-level="8.6.2" data-path="biological-analysis.html"><a href="biological-analysis.html#single-cell-rna-seq"><i class="fa fa-check"></i><b>8.6.2</b> Single cell RNA-seq</a></li>
<li class="chapter" data-level="8.6.3" data-path="biological-analysis.html"><a href="biological-analysis.html#differences-in-distribution"><i class="fa fa-check"></i><b>8.6.3</b> Differences in Distribution</a></li>
<li class="chapter" data-level="8.6.4" data-path="biological-analysis.html"><a href="biological-analysis.html#models-of-single-cell-rnaseq-data"><i class="fa fa-check"></i><b>8.6.4</b> Models of single-cell RNASeq data</a></li>
</ul></li>
<li class="chapter" data-level="8.7" data-path="biological-analysis.html"><a href="biological-analysis.html#de-in-a-real-dataset"><i class="fa fa-check"></i><b>8.7</b> DE in a real dataset</a><ul>
<li class="chapter" data-level="8.7.1" data-path="biological-analysis.html"><a href="biological-analysis.html#introduction-8"><i class="fa fa-check"></i><b>8.7.1</b> Introduction</a></li>
<li class="chapter" data-level="8.7.2" data-path="biological-analysis.html"><a href="biological-analysis.html#kolmogorov-smirnov-test"><i class="fa fa-check"></i><b>8.7.2</b> Kolmogorov-Smirnov test</a></li>
<li class="chapter" data-level="8.7.3" data-path="biological-analysis.html"><a href="biological-analysis.html#wilcoxmann-whitney-u-test"><i class="fa fa-check"></i><b>8.7.3</b> Wilcox/Mann-Whitney-U Test</a></li>
<li class="chapter" data-level="8.7.4" data-path="biological-analysis.html"><a href="biological-analysis.html#edger"><i class="fa fa-check"></i><b>8.7.4</b> edgeR</a></li>
<li class="chapter" data-level="8.7.5" data-path="biological-analysis.html"><a href="biological-analysis.html#monocle-1"><i class="fa fa-check"></i><b>8.7.5</b> Monocle</a></li>
<li class="chapter" data-level="8.7.6" data-path="biological-analysis.html"><a href="biological-analysis.html#mast"><i class="fa fa-check"></i><b>8.7.6</b> MAST</a></li>
<li class="chapter" data-level="8.7.7" data-path="biological-analysis.html"><a href="biological-analysis.html#slow-methods-1h-to-run"><i class="fa fa-check"></i><b>8.7.7</b> Slow Methods (>1h to run)</a></li>
<li class="chapter" data-level="8.7.8" data-path="biological-analysis.html"><a href="biological-analysis.html#bpsc"><i class="fa fa-check"></i><b>8.7.8</b> BPSC</a></li>
<li class="chapter" data-level="8.7.9" data-path="biological-analysis.html"><a href="biological-analysis.html#scde"><i class="fa fa-check"></i><b>8.7.9</b> SCDE</a></li>
<li class="chapter" data-level="8.7.10" data-path="biological-analysis.html"><a href="biological-analysis.html#sessioninfo-9"><i class="fa fa-check"></i><b>8.7.10</b> sessionInfo()</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="9" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html"><i class="fa fa-check"></i><b>9</b> Comparing/Combining scRNASeq datasets</a><ul>
<li class="chapter" data-level="9.1" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#introduction-9"><i class="fa fa-check"></i><b>9.1</b> Introduction</a></li>
<li class="chapter" data-level="9.2" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#datasets"><i class="fa fa-check"></i><b>9.2</b> Datasets</a></li>
<li class="chapter" data-level="9.3" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#projecting-cells-onto-annotated-cell-types-scmap"><i class="fa fa-check"></i><b>9.3</b> Projecting cells onto annotated cell-types (scmap)</a><ul>
<li class="chapter" data-level="9.3.1" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#cell-to-cell-mapping"><i class="fa fa-check"></i><b>9.3.1</b> Cell-to-Cell mapping</a></li>
</ul></li>
<li class="chapter" data-level="9.4" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#metaneighbour"><i class="fa fa-check"></i><b>9.4</b> Metaneighbour</a><ul>
<li class="chapter" data-level="9.4.1" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#prepare-data"><i class="fa fa-check"></i><b>9.4.1</b> Prepare Data</a></li>
</ul></li>
<li class="chapter" data-level="9.5" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#mnncorrect-1"><i class="fa fa-check"></i><b>9.5</b> mnnCorrect</a></li>
<li class="chapter" data-level="9.6" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#cannonical-correlation-analysis-seurat"><i class="fa fa-check"></i><b>9.6</b> Cannonical Correlation Analysis (Seurat)</a><ul>
<li class="chapter" data-level="9.6.1" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#sessioninfo-10"><i class="fa fa-check"></i><b>9.6.1</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="9.7" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#search-scrna-seq-data"><i class="fa fa-check"></i><b>9.7</b> Search scRNA-Seq data</a><ul>
<li class="chapter" data-level="9.7.1" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#about"><i class="fa fa-check"></i><b>9.7.1</b> About</a></li>
<li class="chapter" data-level="9.7.2" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#dataset"><i class="fa fa-check"></i><b>9.7.2</b> Dataset</a></li>
<li class="chapter" data-level="9.7.3" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#gene-index"><i class="fa fa-check"></i><b>9.7.3</b> Gene Index</a></li>
<li class="chapter" data-level="9.7.4" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#marker-genes"><i class="fa fa-check"></i><b>9.7.4</b> Marker genes</a></li>
<li class="chapter" data-level="9.7.5" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#search-cells-by-a-gene-list"><i class="fa fa-check"></i><b>9.7.5</b> Search cells by a gene list</a></li>
<li class="chapter" data-level="9.7.6" data-path="comparingcombining-scrnaseq-datasets.html"><a href="comparingcombining-scrnaseq-datasets.html#sessioninfo-11"><i class="fa fa-check"></i><b>9.7.6</b> sessionInfo()</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="10" data-path="seurat-chapter.html"><a href="seurat-chapter.html"><i class="fa fa-check"></i><b>10</b> Seurat</a><ul>
<li class="chapter" data-level="10.1" data-path="seurat-chapter.html"><a href="seurat-chapter.html#seurat-object-class"><i class="fa fa-check"></i><b>10.1</b> <code>Seurat</code> object class</a></li>
<li class="chapter" data-level="10.2" data-path="seurat-chapter.html"><a href="seurat-chapter.html#expression-qc"><i class="fa fa-check"></i><b>10.2</b> Expression QC</a></li>
<li class="chapter" data-level="10.3" data-path="seurat-chapter.html"><a href="seurat-chapter.html#normalization"><i class="fa fa-check"></i><b>10.3</b> Normalization</a></li>
<li class="chapter" data-level="10.4" data-path="seurat-chapter.html"><a href="seurat-chapter.html#highly-variable-genes-1"><i class="fa fa-check"></i><b>10.4</b> Highly variable genes</a></li>
<li class="chapter" data-level="10.5" data-path="seurat-chapter.html"><a href="seurat-chapter.html#dealing-with-confounders-1"><i class="fa fa-check"></i><b>10.5</b> Dealing with confounders</a></li>
<li class="chapter" data-level="10.6" data-path="seurat-chapter.html"><a href="seurat-chapter.html#linear-dimensionality-reduction"><i class="fa fa-check"></i><b>10.6</b> Linear dimensionality reduction</a></li>
<li class="chapter" data-level="10.7" data-path="seurat-chapter.html"><a href="seurat-chapter.html#significant-pcs"><i class="fa fa-check"></i><b>10.7</b> Significant PCs</a></li>
<li class="chapter" data-level="10.8" data-path="seurat-chapter.html"><a href="seurat-chapter.html#clustering-cells"><i class="fa fa-check"></i><b>10.8</b> Clustering cells</a></li>
<li class="chapter" data-level="10.9" data-path="seurat-chapter.html"><a href="seurat-chapter.html#marker-genes-1"><i class="fa fa-check"></i><b>10.9</b> Marker genes</a></li>
<li class="chapter" data-level="10.10" data-path="seurat-chapter.html"><a href="seurat-chapter.html#sessioninfo-12"><i class="fa fa-check"></i><b>10.10</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="11" data-path="ideal-scrnaseq-pipeline-as-of-oct-2017.html"><a href="ideal-scrnaseq-pipeline-as-of-oct-2017.html"><i class="fa fa-check"></i><b>11</b> “Ideal” scRNAseq pipeline (as of Oct 2017)</a><ul>
<li class="chapter" data-level="11.1" data-path="ideal-scrnaseq-pipeline-as-of-oct-2017.html"><a href="ideal-scrnaseq-pipeline-as-of-oct-2017.html#experimental-design"><i class="fa fa-check"></i><b>11.1</b> Experimental Design</a></li>
<li class="chapter" data-level="11.2" data-path="ideal-scrnaseq-pipeline-as-of-oct-2017.html"><a href="ideal-scrnaseq-pipeline-as-of-oct-2017.html#processing-reads"><i class="fa fa-check"></i><b>11.2</b> Processing Reads</a></li>
<li class="chapter" data-level="11.3" data-path="ideal-scrnaseq-pipeline-as-of-oct-2017.html"><a href="ideal-scrnaseq-pipeline-as-of-oct-2017.html#preparing-expression-matrix"><i class="fa fa-check"></i><b>11.3</b> Preparing Expression Matrix</a></li>
<li class="chapter" data-level="11.4" data-path="ideal-scrnaseq-pipeline-as-of-oct-2017.html"><a href="ideal-scrnaseq-pipeline-as-of-oct-2017.html#biological-interpretation"><i class="fa fa-check"></i><b>11.4</b> Biological Interpretation</a></li>
</ul></li>
<li class="chapter" data-level="12" data-path="advanced-exercises.html"><a href="advanced-exercises.html"><i class="fa fa-check"></i><b>12</b> Advanced exercises</a></li>
<li class="chapter" data-level="13" data-path="resources.html"><a href="resources.html"><i class="fa fa-check"></i><b>13</b> Resources</a><ul>
<li class="chapter" data-level="13.1" data-path="resources.html"><a href="resources.html#scrna-seq-protocols"><i class="fa fa-check"></i><b>13.1</b> scRNA-seq protocols</a></li>
<li class="chapter" data-level="13.2" data-path="resources.html"><a href="resources.html#external-rna-control-consortium-ercc"><i class="fa fa-check"></i><b>13.2</b> External RNA Control Consortium (ERCC)</a></li>
<li class="chapter" data-level="13.3" data-path="resources.html"><a href="resources.html#scrna-seq-analysis-tools"><i class="fa fa-check"></i><b>13.3</b> scRNA-seq analysis tools</a></li>
<li class="chapter" data-level="13.4" data-path="resources.html"><a href="resources.html#scrna-seq-public-datasets"><i class="fa fa-check"></i><b>13.4</b> scRNA-seq public datasets</a></li>
</ul></li>
<li class="chapter" data-level="14" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>14</b> References</a></li>
<li class="divider"></li>
<li><a href="http://www.sanger.ac.uk/science/groups/hemberg-group" target="blank">Hemberg Lab</a></li>
</ul>
</nav>
</div>
<div class="book-body">
<div class="body-inner">
<div class="book-header" role="navigation">
<h1>
<i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Analysis of single cell RNA-seq data</a>
</h1>
</div>
<div class="page-wrapper" tabindex="-1" role="main">
<div class="page-inner">
<section class="normal" id="section-">
<div id="comparingcombining-scrnaseq-datasets" class="section level1">
<h1><span class="header-section-number">9</span> Comparing/Combining scRNASeq datasets</h1>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(scater)
<span class="kw">library</span>(SingleCellExperiment)</code></pre></div>
<div id="introduction-9" class="section level2">
<h2><span class="header-section-number">9.1</span> Introduction</h2>
<p>As more and more scRNA-seq datasets become available, carrying merged_seurat comparisons between them is key. There are two main approaches to comparing scRNASeq datasets. The first approach is “label-centric” which is focused on trying to identify equivalent cell-types/states across datasets by comparing individual cells or groups of cells. The other approach is “cross-dataset normalization” which attempts to computationally remove experiment-specific technical/biological effects so that data from multiple experiments can be combined and jointly analyzed.</p>
<p>The label-centric approach can be used with dataset with high-confidence cell-annotations, e.g. the Human Cell Atlas (HCA) <span class="citation">(Regev et al. <a href="#ref-Regev2017-mw">2017</a>)</span> or the Tabula Muris <span class="citation">(<span class="citeproc-not-found" data-reference-id="Quake2017"><strong>???</strong></span>)</span> once they are completed, to project cells or clusters from a new sample onto this reference to consider tissue composition and/or identify cells with novel/unknown identity. Conceptually, such projections are similar to the popular BLAST method <span class="citation">(Altschul et al. <a href="#ref-Altschul1990-ts">1990</a>)</span>, which makes it possible to quickly find the closest match in a database for a newly identified nucleotide or amino acid sequence. The label-centric approach can also be used to compare datasets of similar biological origin collected by different labs to ensure that the annotation and the analysis is consistent.</p>
<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-3"></span>
<img src="figures/CourseCompareTypes.png" alt="Label-centric dataset comparison can be used to compare the annotations of two different samples." />
<p class="caption">
Figure 2.4: Label-centric dataset comparison can be used to compare the annotations of two different samples.
</p>
</div>
<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-4"></span>
<img src="figures/CourseAtlasAssignment.png" alt="Label-centric dataset comparison can project cells from a new experiment onto an annotated reference." />
<p class="caption">
Figure 2.5: Label-centric dataset comparison can project cells from a new experiment onto an annotated reference.
</p>
</div>
<p>The cross-dataset normalization approach can also be used to compare datasets of similar biological origin, unlike the label-centric approach it enables the join analysis of multiple datasets to facilitate the identification of rare cell-types which may to too sparsely sampled in each individual dataset to be reliably detected. However, cross-dataset normalization is not applicable to very large and diverse references since it assumes a significant portion of the biological variablility in each of the datasets overlaps with others.</p>
<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-5"></span>
<img src="figures/CourseCrossNorm.png" alt="Cross-dataset normalization enables joint-analysis of 2+ scRNASeq datasets." />
<p class="caption">
Figure 2.6: Cross-dataset normalization enables joint-analysis of 2+ scRNASeq datasets.
</p>
</div>
</div>
<div id="datasets" class="section level2">
<h2><span class="header-section-number">9.2</span> Datasets</h2>
<p>We will running these methods on two human pancreas datasets: <span class="citation">(Muraro et al. <a href="#ref-Muraro2016-yk">2016</a>)</span> and <span class="citation">(Segerstolpe et al. <a href="#ref-Segerstolpe2016-wc">2016</a>)</span>. Since the pancreas has been widely studied, these datasets are well annotated.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">muraro <-<span class="st"> </span><span class="kw">readRDS</span>(<span class="st">"pancreas/muraro.rds"</span>)
segerstolpe <-<span class="st"> </span><span class="kw">readRDS</span>(<span class="st">"pancreas/segerstolpe.rds"</span>)</code></pre></div>
<p>This data has already been formatted for scmap. Cell type labels must be stored in the <code>cell_type1</code> column of the <code>colData</code> slots, and gene ids that are consistent across both datasets must be stored in the <code>feature_symbol</code> column of the <code>rowData</code> slots.</p>
<p>First, lets check our gene-ids match across both datasets:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">sum</span>(<span class="kw">rowData</span>(muraro)<span class="op">$</span>feature_symbol <span class="op">%in%</span><span class="st"> </span><span class="kw">rowData</span>(segerstolpe)<span class="op">$</span>feature_symbol)<span class="op">/</span><span class="kw">nrow</span>(muraro)</code></pre></div>
<pre><code>## [1] 0.9599519</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">sum</span>(<span class="kw">rowData</span>(segerstolpe)<span class="op">$</span>feature_symbol <span class="op">%in%</span><span class="st"> </span><span class="kw">rowData</span>(muraro)<span class="op">$</span>feature_symbol)<span class="op">/</span><span class="kw">nrow</span>(segerstolpe)</code></pre></div>
<pre><code>## [1] 0.719334</code></pre>
<p>Here we can see that 96% of the genes present in muraro match genes in segerstople and 72% of genes in segerstolpe are match genes in muraro. This is as expected because the segerstolpe dataset was more deeply sequenced than the muraro dataset. However, it highlights some of the difficulties in comparing scRNASeq datasets.</p>
<p>We can confirm this by checking the overall size of these two datasets.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">dim</span>(muraro)</code></pre></div>
<pre><code>## [1] 19127 2126</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">dim</span>(segerstolpe)</code></pre></div>
<pre><code>## [1] 25525 3514</code></pre>
<p>In addition, we can check the cell-type annotations for each of these dataset using the command below:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">summary</span>(<span class="kw">factor</span>(<span class="kw">colData</span>(muraro)<span class="op">$</span>cell_type1))</code></pre></div>
<pre><code>## acinar alpha beta delta ductal endothelial
## 219 812 448 193 245 21
## epsilon gamma mesenchymal unclear
## 3 101 80 4</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">summary</span>(<span class="kw">factor</span>(<span class="kw">colData</span>(segerstolpe)<span class="op">$</span>cell_type1))</code></pre></div>
<pre><code>## acinar alpha beta
## 185 886 270
## co-expression delta ductal
## 39 114 386
## endothelial epsilon gamma
## 16 7 197
## mast MHC class II not applicable
## 7 5 1305
## PSC unclassified unclassified endocrine
## 54 2 41</code></pre>
<p>Here we can see that even though both datasets considered the same biological tissue the two datasets, they have been annotated with slightly different sets of cell-types. If you are familiar withpancreas biology you might recognize that the pancreatic stellate cells (PSCs) in segerstolpe are a type of mesenchymal stem cell which would fall under the “mesenchymal” type in muraro. However, it isn’t clear whether these two annotations should be considered synonymous or not. We can use label-centric comparison methods to determine if these two cell-type annotations are indeed equivalent.</p>
<p>Alternatively, we might be interested in understanding the function of those cells that were “unclassified endocrine” or were deemed too poor quality (“not applicable”) for the original clustering in each dataset by leveraging in formation across datasets. Either we could attempt to infer which of the existing annotations they most likely belong to using label-centric approaches or we could try to uncover a novel cell-type among them (or a sub-type within the existing annotations) using cross-dataset normalization.</p>
<p>To simplify our demonstration analyses we will remove the small classes of unassigned cells, and the poor quality cells. We will retain the “unclassified endocrine” to see if any of these methods can elucidate what cell-type they belong to.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">segerstolpe <-<span class="st"> </span>segerstolpe[,<span class="kw">colData</span>(segerstolpe)<span class="op">$</span>cell_type1 <span class="op">!=</span><span class="st"> "unclassified"</span>]
segerstolpe <-<span class="st"> </span>segerstolpe[,<span class="kw">colData</span>(segerstolpe)<span class="op">$</span>cell_type1 <span class="op">!=</span><span class="st"> "not applicable"</span>,]
muraro <-<span class="st"> </span>muraro[,<span class="kw">colData</span>(muraro)<span class="op">$</span>cell_type1 <span class="op">!=</span><span class="st"> "unclear"</span>]</code></pre></div>
</div>
<div id="projecting-cells-onto-annotated-cell-types-scmap" class="section level2">
<h2><span class="header-section-number">9.3</span> Projecting cells onto annotated cell-types (scmap)</h2>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(scmap)
<span class="kw">set.seed</span>(<span class="dv">1234567</span>)</code></pre></div>
<p>We recently developed <code>scmap</code> <span class="citation">(Kiselev and Hemberg <a href="#ref-Kiselev2017-nb">2017</a>)</span> - a method for projecting cells from a scRNA-seq experiment onto the cell-types identified in other experiments. Additionally, a cloud version of <code>scmap</code> can be run for free, withmerged_seurat restrictions, from <a href="http://www.hemberg-lab.cloud/scmap" class="uri">http://www.hemberg-lab.cloud/scmap</a>.</p>
<div id="feature-selection-1" class="section level4">
<h4><span class="header-section-number">9.3.0.1</span> Feature Selection</h4>
<p>Once we have a <code>SingleCellExperiment</code> object we can run <code>scmap</code>. First we have to build the “index” of our reference clusters. Since we want to know whether PSCs and mesenchymal cells are synonymous we will project each dataset to the other so we will build an index for each dataset. This requires first selecting the most informative features for the reference dataset.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">muraro <-<span class="st"> </span><span class="kw">selectFeatures</span>(muraro, <span class="dt">suppress_plot =</span> <span class="ot">FALSE</span>)</code></pre></div>
<pre><code>## Warning in linearModel(object, n_features): Your object does not contain
## counts() slot. Dropouts were calculated using logcounts() slot...</code></pre>
<p><img src="31-projection_files/figure-html/unnamed-chunk-12-1.png" width="672" style="display: block; margin: auto;" /></p>
<p>Genes highlighted with the red colour will be used in the futher analysis (projection).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">segerstolpe <-<span class="st"> </span><span class="kw">selectFeatures</span>(segerstolpe, <span class="dt">suppress_plot =</span> <span class="ot">FALSE</span>)</code></pre></div>
<p><img src="31-projection_files/figure-html/unnamed-chunk-13-1.png" width="672" style="display: block; margin: auto;" /> From the y-axis of these plots we can see that scmap uses a dropmerged_seurat-based feature selection method.</p>
<p>Now calculate the cell-type index:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">muraro <-<span class="st"> </span><span class="kw">indexCluster</span>(muraro)
segerstolpe <-<span class="st"> </span><span class="kw">indexCluster</span>(segerstolpe)</code></pre></div>
<p>We can also visualize the index:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">heatmap</span>(<span class="kw">as.matrix</span>(<span class="kw">metadata</span>(muraro)<span class="op">$</span>scmap_cluster_index))</code></pre></div>
<p><img src="31-projection_files/figure-html/unnamed-chunk-15-1.png" width="672" style="display: block; margin: auto;" /></p>
<p>You may want to adjust your features using the <code>setFeatures</code> function if features are too heavily concentrated in only a few cell-types. In this case the dropmerged_seurat-based features look good so we will just them.</p>
<p><strong>Exercise</strong> Using the rowData of each dataset how many genes were selected as features in both datasets? What does this tell you abmerged_seurat these datasets?</p>
<p><strong>Answer</strong></p>
</div>
<div id="projecting" class="section level4">
<h4><span class="header-section-number">9.3.0.2</span> Projecting</h4>
<p>scmap computes the distance from each cell to each cell-type in the reference index, then applies an empirically derived threshold to determine which cells are assigned to the closest reference cell-type and which are unassigned. To account for differences in sequencing depth distance is calculated using the spearman correlation and cosine distance and only cells with a consistent assignment with both distances are returned as assigned.</p>
<p>We will project the <code>segerstolpe</code> dataset to <code>muraro</code> dataset:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">seger_to_muraro <-<span class="st"> </span><span class="kw">scmapCluster</span>(
<span class="dt">projection =</span> segerstolpe,
<span class="dt">index_list =</span> <span class="kw">list</span>(
<span class="dt">muraro =</span> <span class="kw">metadata</span>(muraro)<span class="op">$</span>scmap_cluster_index
)
)</code></pre></div>
<p>and <code>muraro</code> onto <code>segerstolpe</code></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">muraro_to_seger <-<span class="st"> </span><span class="kw">scmapCluster</span>(
<span class="dt">projection =</span> muraro,
<span class="dt">index_list =</span> <span class="kw">list</span>(
<span class="dt">seger =</span> <span class="kw">metadata</span>(segerstolpe)<span class="op">$</span>scmap_cluster_index
)
)</code></pre></div>
<p>Note that in each case we are projecting to a single dataset but that this could be extended to any number of datasets for which we have computed indices.</p>
<p>Now lets compare the original cell-type labels with the projected labels:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">table</span>(<span class="kw">colData</span>(muraro)<span class="op">$</span>cell_type1, muraro_to_seger<span class="op">$</span>scmap_cluster_labs)</code></pre></div>
<pre><code>##
## acinar alpha beta co-expression delta ductal endothelial
## acinar 211 0 0 0 0 0 0
## alpha 1 763 0 18 0 2 0
## beta 2 1 397 7 2 2 0
## delta 0 0 2 1 173 0 0
## ductal 7 0 0 0 0 208 0
## endothelial 0 0 0 0 0 0 15
## epsilon 0 0 0 0 0 0 0
## gamma 2 0 0 0 0 0 0
## mesenchymal 0 0 0 0 0 1 0
##
## epsilon gamma MHC class II PSC unassigned
## acinar 0 0 0 0 8
## alpha 0 2 0 0 26
## beta 0 5 1 2 29
## delta 0 0 0 0 17
## ductal 0 0 5 3 22
## endothelial 0 0 0 1 5
## epsilon 3 0 0 0 0
## gamma 0 95 0 0 4
## mesenchymal 0 0 0 77 2</code></pre>
<p>Here we can see that cell-types do map to their equivalents in segerstolpe, and importantly we see that all but one of the “mesenchymal” cells were assigned to the “PSC” class.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">table</span>(<span class="kw">colData</span>(segerstolpe)<span class="op">$</span>cell_type1, seger_to_muraro<span class="op">$</span>scmap_cluster_labs)</code></pre></div>
<pre><code>##
## acinar alpha beta delta ductal endothelial
## acinar 181 0 0 0 4 0
## alpha 0 869 1 0 0 0
## beta 0 0 260 0 0 0
## co-expression 0 7 31 0 0 0
## delta 0 0 1 111 0 0
## ductal 0 0 0 0 383 0
## endothelial 0 0 0 0 0 14
## epsilon 0 0 0 0 0 0
## gamma 0 2 0 0 0 0
## mast 0 0 0 0 0 0
## MHC class II 0 0 0 0 0 0
## PSC 0 0 1 0 0 0
## unclassified endocrine 0 0 0 0 0 0
##
## epsilon gamma mesenchymal unassigned
## acinar 0 0 0 0
## alpha 0 0 0 16
## beta 0 0 0 10
## co-expression 0 0 0 1
## delta 0 0 0 2
## ductal 0 0 0 3
## endothelial 0 0 0 2
## epsilon 6 0 0 1
## gamma 0 192 0 3
## mast 0 0 0 7
## MHC class II 0 0 0 5
## PSC 0 0 53 0
## unclassified endocrine 0 0 0 41</code></pre>
<p>Again we see cell-types match each other and that all but one of the “PSCs” match the “mesenchymal” cells providing strong evidence that these two annotations should be considered synonymous.</p>
<p>We can also visualize these tables using a <a href="https://developers.google.com/chart/interactive/docs/gallery/sankey">Sankey diagram</a>:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">plot</span>(<span class="kw">getSankey</span>(<span class="kw">colData</span>(muraro)<span class="op">$</span>cell_type1, muraro_to_seger<span class="op">$</span>scmap_cluster_labs[,<span class="dv">1</span>], <span class="dt">plot_height=</span><span class="dv">400</span>))</code></pre></div>
<!-- Sankey generated in R 3.4.3 by googleVis 0.6.2 package -->
<!-- Sat Feb 3 15:39:10 2018 -->
<!-- jsHeader -->
<script type="text/javascript">
// jsData
function gvisDataSankeyID7ae6f87c0b2 () {
var data = new google.visualization.DataTable();
var datajson =
[
[
"alpha ",
" alpha",
763
],
[
"beta ",
" beta",
397
],
[
"acinar ",
" acinar",
211
],
[
"ductal ",
" ductal",
208
],
[
"delta ",
" delta",
173
],
[
"gamma ",
" gamma",
95
],
[
"mesenchymal ",
" PSC",
77
],
[
"endothelial ",
" endothelial",
15
],
[
"epsilon ",
" epsilon",
3
],
[
"acinar ",
" unassigned",
8
],
[
"alpha ",
" ductal",
2
],
[
"alpha ",
" unassigned",
26
],
[
"alpha ",
" acinar",
1
],
[
"alpha ",
" co-expression",
18
],
[
"alpha ",
" gamma",
2
],
[
"beta ",
" unassigned",
29
],
[
"beta ",
" gamma",
5
],
[
"beta ",
" MHC class II",
1
],
[
"beta ",
" alpha",
1
],
[
"beta ",
" co-expression",
7
],
[
"beta ",
" acinar",
2
],
[
"beta ",
" PSC",
2
],
[
"beta ",
" ductal",
2
],
[
"beta ",
" delta",
2
],
[
"delta ",
" beta",
2
],
[
"delta ",
" unassigned",
17
],
[
"delta ",
" co-expression",
1
],
[
"ductal ",
" acinar",
7
],
[
"ductal ",
" PSC",
3
],
[
"ductal ",
" MHC class II",
5
],
[
"ductal ",
" unassigned",
22
],
[
"endothelial ",
" PSC",
1
],
[
"endothelial ",
" unassigned",
5
],
[
"gamma ",
" acinar",
2
],
[
"gamma ",
" unassigned",
4
],
[
"mesenchymal ",
" ductal",
1
],
[
"mesenchymal ",
" unassigned",
2
]
];
data.addColumn('string','From');
data.addColumn('string','To');
data.addColumn('number','# of cells');
data.addRows(datajson);
return(data);
}
// jsDrawChart
function drawChartSankeyID7ae6f87c0b2() {
var data = gvisDataSankeyID7ae6f87c0b2();
var options = {};
options["width"] = 400;
options["height"] = 400;
options["sankey"] = {
node:{
label:{
fontName:'Arial',
fontSize:11,color:
'#000000',
bold:true,
italic:false
},
colors:'#FFFFFF',
nodePadding:12
},iterations:0
};
var chart = new google.visualization.Sankey(
document.getElementById('SankeyID7ae6f87c0b2')
);
chart.draw(data,options);
}
// jsDisplayChart
(function() {
var pkgs = window.__gvisPackages = window.__gvisPackages || [];
var callbacks = window.__gvisCallbacks = window.__gvisCallbacks || [];
var chartid = "sankey";
// Manually see if chartid is in pkgs (not all browsers support Array.indexOf)
var i, newPackage = true;
for (i = 0; newPackage && i < pkgs.length; i++) {
if (pkgs[i] === chartid)
newPackage = false;
}
if (newPackage)
pkgs.push(chartid);
// Add the drawChart function to the global list of callbacks
callbacks.push(drawChartSankeyID7ae6f87c0b2);
})();
function displayChartSankeyID7ae6f87c0b2() {
var pkgs = window.__gvisPackages = window.__gvisPackages || [];
var callbacks = window.__gvisCallbacks = window.__gvisCallbacks || [];
window.clearTimeout(window.__gvisLoad);
// The timeout is set to 100 because otherwise the container div we are
// targeting might not be part of the document yet
window.__gvisLoad = setTimeout(function() {
var pkgCount = pkgs.length;
google.load("visualization", "1", { packages:pkgs, callback: function() {
if (pkgCount != pkgs.length) {
// Race condition where another setTimeout call snuck in after us; if
// that call added a package, we must not shift its callback
return;
}
while (callbacks.length > 0)
callbacks.shift()();
} });
}, 100);
}
// jsFooter
</script>
<!-- jsChart -->
<script type="text/javascript" src="https://www.google.com/jsapi?callback=displayChartSankeyID7ae6f87c0b2"></script>
<!-- divChart -->
<div id="SankeyID7ae6f87c0b2" style="width: 400; height: 400;">
</div>
<p><strong>Exercise</strong> How many of the previously unclassified cells would be be able to assign to cell-types using scmap?</p>
<p><strong>Answer</strong></p>
</div>
<div id="cell-to-cell-mapping" class="section level3">
<h3><span class="header-section-number">9.3.1</span> Cell-to-Cell mapping</h3>
<p>scmap can also project each cell in one dataset to its approximate closest neighbouring cell in the reference dataset. This uses a highly optimized search algorithm allowing it to be scaled to very large references (in theory 100,000-millions of cells). However, this process is stochastic so we must fix the random seed to ensure we can reproduce our results.</p>
<p>We have already performed feature selection for this dataset so we can go straight to building the index.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">193047</span>)
segerstolpe <-<span class="st"> </span><span class="kw">indexCell</span>(segerstolpe)</code></pre></div>
<pre><code>## Parameter M was not provided, will use M = n_features / 10 (if n_features <= 1000), where n_features is the number of selected features, and M = 100 otherwise.</code></pre>
<pre><code>## Parameter k was not provided, will use k = sqrt(number_of_cells)</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">muraro <-<span class="st"> </span><span class="kw">indexCell</span>(muraro)</code></pre></div>
<pre><code>## Parameter M was not provided, will use M = n_features / 10 (if n_features <= 1000), where n_features is the number of selected features, and M = 100 otherwise.
## Parameter k was not provided, will use k = sqrt(number_of_cells)</code></pre>
<p>In this case the index is a series of clusterings of each cell using different sets of features, parameters k and M are the number of clusters and the number of features used in each of these subclusterings. New cells are assigned to the nearest cluster in each subclustering to generate unique pattern of cluster assignments. We then find the cell in the reference dataset with the same or most similar pattern of cluster assignments.</p>
<p>We can examine the cluster assignment patterns for the reference datasets using:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">metadata</span>(muraro)<span class="op">$</span>scmap_cell_index<span class="op">$</span>subclusters[<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>,<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>]</code></pre></div>
<pre><code>## D28.1_1 D28.1_13 D28.1_15 D28.1_17 D28.1_2
## [1,] 4 42 27 43 10
## [2,] 5 8 2 33 37
## [3,] 11 32 35 17 26
## [4,] 2 4 32 2 18
## [5,] 31 18 21 40 1</code></pre>
<p>To project and find the <code>w</code> nearest neighbours we use a similar command as before:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">muraro_to_seger <-<span class="st"> </span><span class="kw">scmapCell</span>(
<span class="dt">projection =</span> muraro,
<span class="dt">index_list =</span> <span class="kw">list</span>(
<span class="dt">seger =</span> <span class="kw">metadata</span>(segerstolpe)<span class="op">$</span>scmap_cell_index
),
<span class="dt">w =</span> <span class="dv">5</span>
)</code></pre></div>
<p>We can again look at the results:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">muraro_to_seger<span class="op">$</span>seger[[<span class="dv">1</span>]][,<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>]</code></pre></div>
<pre><code>## D28.1_1 D28.1_13 D28.1_15 D28.1_17 D28.1_2
## [1,] 2201 1288 1117 1623 1078
## [2,] 1229 1724 2104 1448 1593
## [3,] 1793 1854 2201 2039 1553
## [4,] 1882 1737 1081 1202 1890
## [5,] 1731 976 1903 1834 1437</code></pre>
<p>This shows the column number of the 5 nearest neighbours in segerstolpe to each of the cells in muraro. We could then calculate a pseudotime estimate, branch assignment, or other cell-level data by selecting the appropriate data from the colData of the segerstolpe data set. As a demonstration we will find the cell-type of the nearest neighbour of each cell.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">cell_type_NN <-<span class="st"> </span><span class="kw">colData</span>(segerstolpe)<span class="op">$</span>cell_type1[muraro_to_seger<span class="op">$</span>seger[[<span class="dv">1</span>]][<span class="dv">1</span>,]]
<span class="kw">head</span>(cell_type_NN)</code></pre></div>
<pre><code>## [1] "alpha" "ductal" "alpha" "alpha" "endothelial"
## [6] "endothelial"</code></pre>
</div>
</div>
<div id="metaneighbour" class="section level2">
<h2><span class="header-section-number">9.4</span> Metaneighbour</h2>
<p><a href="https://www.biorxiv.org/content/early/2017/06/16/150524">Metaneighbour</a> is specifically designed to ask whether cell-type labels are consistent across datasets. It comes in two versions. First is a fully supervised method which assumes cell-types are known in all datasets and calculates how “good” those cell-type labels are. (The precise meaning of “good” will be described below). Alternatively, metaneighbour can estimate how similar all cell-types are to each other both within and across datasets. We will only be using the unsupervised version as it has much more general applicability and is easier to interpret the results of.</p>
<p>Metaneighbour compares cell-types across datasets by building a cell-cell spearman correlation network. The method then tries to predict the label of each cell through weighted “votes” of its nearest-neighbours. Then scores the overall similarity between two clusters as the AUROC for assigning cells of typeA to typeB based on these weighted votes. AUROC of 1 would indicate all the cells of typeA were assigned to typeB before any other cells were, and an AUROC of 0.5 is what you would get if cells were being randomly assigned.</p>
<p>Metanighbour is just a couple of R functions not a complete package so we have to load them using <code>source</code></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">source</span>(<span class="st">"2017-08-28-runMN-US.R"</span>)</code></pre></div>
<div id="prepare-data" class="section level3">
<h3><span class="header-section-number">9.4.1</span> Prepare Data</h3>
<p>Metaneighbour requires all datasets to be combined into a single expression matrix prior to running:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">is.common <-<span class="st"> </span><span class="kw">rowData</span>(muraro)<span class="op">$</span>feature_symbol <span class="op">%in%</span><span class="st"> </span><span class="kw">rowData</span>(segerstolpe)<span class="op">$</span>feature_symbol
muraro <-<span class="st"> </span>muraro[is.common,]
segerstolpe <-<span class="st"> </span>segerstolpe[<span class="kw">match</span>(<span class="kw">rowData</span>(muraro)<span class="op">$</span>feature_symbol, <span class="kw">rowData</span>(segerstolpe)<span class="op">$</span>feature_symbol),]
<span class="kw">rownames</span>(segerstolpe) <-<span class="st"> </span><span class="kw">rowData</span>(segerstolpe)<span class="op">$</span>feature_symbol
<span class="kw">rownames</span>(muraro) <-<span class="st"> </span><span class="kw">rowData</span>(muraro)<span class="op">$</span>feature_symbol
<span class="kw">identical</span>(<span class="kw">rownames</span>(segerstolpe), <span class="kw">rownames</span>(muraro))</code></pre></div>
<pre><code>## [1] TRUE</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">combined_logcounts <-<span class="st"> </span><span class="kw">cbind</span>(<span class="kw">logcounts</span>(muraro), <span class="kw">logcounts</span>(segerstolpe))
dataset_labels <-<span class="st"> </span><span class="kw">rep</span>(<span class="kw">c</span>(<span class="st">"m"</span>, <span class="st">"s"</span>), <span class="dt">times=</span><span class="kw">c</span>(<span class="kw">ncol</span>(muraro), <span class="kw">ncol</span>(segerstolpe)))
cell_type_labels <-<span class="st"> </span><span class="kw">c</span>(<span class="kw">colData</span>(muraro)<span class="op">$</span>cell_type1, <span class="kw">colData</span>(segerstolpe)<span class="op">$</span>cell_type1)
pheno <-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">Sample_ID =</span> <span class="kw">colnames</span>(combined_logcounts),
<span class="dt">Study_ID=</span>dataset_labels,
<span class="dt">Celltype=</span><span class="kw">paste</span>(cell_type_labels, dataset_labels, <span class="dt">sep=</span><span class="st">"-"</span>))
<span class="kw">rownames</span>(pheno) <-<span class="st"> </span><span class="kw">colnames</span>(combined_logcounts)</code></pre></div>
<p>Metaneighbor includes a feature selection method to identify highly variable genes.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">var.genes =<span class="st"> </span><span class="kw">get_variable_genes</span>(combined_logcounts, pheno)</code></pre></div>
<p>Since Metaneighbor is much slower than <code>scmap</code>, we will down sample these datasets.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">subset <-<span class="st"> </span><span class="kw">sample</span>(<span class="dv">1</span><span class="op">:</span><span class="kw">nrow</span>(pheno), <span class="dv">2000</span>)
combined_logcounts <-<span class="st"> </span>combined_logcounts[,subset]
pheno <-<span class="st"> </span>pheno[subset,]
cell_type_labels <-<span class="st"> </span>cell_type_labels[subset]
dataset_labels <-<span class="st"> </span>dataset_labels[subset]</code></pre></div>
<p>Now we are ready to run Metaneighbor. First we will run the unsupervised version that will let us see which cell-types are most similar across the two datasets.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">unsup <-<span class="st"> </span><span class="kw">run_MetaNeighbor_US</span>(var.genes, combined_logcounts, <span class="kw">unique</span>(pheno<span class="op">$</span>Celltype), pheno)
<span class="kw">heatmap</span>(unsup)</code></pre></div>
<p><img src="31-projection_files/figure-html/unnamed-chunk-32-1.png" width="672" style="display: block; margin: auto;" /></p>
</div>
</div>
<div id="mnncorrect-1" class="section level2">
<h2><span class="header-section-number">9.5</span> mnnCorrect</h2>
<p><a href="https://www.biorxiv.org/content/early/2017/07/18/165118">mnnCorrect</a> corrects datasets to facilitate joint analysis. It order to account for differences in composition between two replicates or two different experiments it first matches invidual cells across experiments to find the overlaping biologicial structure. Using that overlap it learns which dimensions of expression correspond to the biological state and which dimensions correspond to batch/experiment effect; mnnCorrect assumes these dimensions are orthologal to each other in high dimensional expression space. Finally it removes the batch/experiment effects from the entire expression matrix to return the corrected matrix.</p>
<p>To match individual cells to each other across datasets, mnnCorrect uses the cosine distance to avoid library-size effect then identifies mututal nearest neighbours (<code>k</code> determines to neighbourhood size) across datasets. Only overlaping biological groups should have mutual nearest neighbours (see panel b below). However, this assumes that k is set to approximately the size of the smallest biological group in the datasets, but a k that is too low will identify too few mutual nearest-neighbour pairs to get a good estimate of the batch effect we want to remove.</p>
<p>Learning the biological/techncial effects is done with either singular value decomposition, similar to RUV we encounters in the batch-correction section, or with principal component analysis with the opitimized irlba package, which should be faster than SVD. The parameter <code>svd.dim</code> specifies how many dimensions should be kept to summarize the biological structure of the data, we will set it to three as we found three major groups using Metaneighbor above. These estimates may be futher adjusted by smoothing (<code>sigma</code>) and/or variance adjustment (<code>var.adj</code>).</p>
<p>mnnCorrect also assumes you’ve already subset your expression matricies so that they contain identical genes in the same order, fortunately we have already done with for our datasets when we set up our data for Metaneighbor.</p>
<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-33"></span>
<img src="figures/mnnCorrectDiagramCropped.png" alt="mnnCorrect batch/dataset effect correction. From Haghverdi et al. 2017" />
<p class="caption">
Figure 9.1: mnnCorrect batch/dataset effect correction. From Haghverdi et al. 2017
</p>
</div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">require</span>(<span class="st">"scran"</span>)</code></pre></div>
<pre><code>## Loading required package: scran</code></pre>
<pre><code>## Loading required package: BiocParallel</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># mnnCorrect will take several minutes to run</span>
corrected <-<span class="st"> </span><span class="kw">mnnCorrect</span>(<span class="kw">logcounts</span>(muraro), <span class="kw">logcounts</span>(segerstolpe), <span class="dt">k=</span><span class="dv">20</span>, <span class="dt">sigma=</span><span class="dv">1</span>, <span class="dt">pc.approx=</span><span class="ot">TRUE</span>, <span class="dt">subset.row=</span>var.genes, <span class="dt">svd.dim=</span><span class="dv">3</span>)</code></pre></div>
<p>First let’s check that we found a sufficient number of mnn pairs, mnnCorrect returns a list of dataframe with the mnn pairs for each dataset.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">dim</span>(corrected<span class="op">$</span>pairs[[<span class="dv">1</span>]]) <span class="co"># muraro -> others</span></code></pre></div>
<pre><code>## [1] 0 3</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">dim</span>(corrected<span class="op">$</span>pairs[[<span class="dv">2</span>]]) <span class="co"># seger -> others</span></code></pre></div>
<pre><code>## [1] 2533 3</code></pre>
<p>The first and second columns contain the cell column IDs and the third column contains a number indicating which dataset/batch the column 2 cell belongs to. In our case, we are only comparing two datasets so all the mnn pairs have been assigned to the second table and the third column contains only ones</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(corrected<span class="op">$</span>pairs[[<span class="dv">2</span>]])</code></pre></div>
<pre><code>## DataFrame with 6 rows and 3 columns
## current.cell other.cell other.batch
## <integer> <Rle> <Rle>
## 1 1553 5 1
## 2 1078 5 1
## 3 1437 5 1
## 4 1890 5 1
## 5 1569 5 1
## 6 373 5 1</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">total_pairs <-<span class="st"> </span><span class="kw">nrow</span>(corrected<span class="op">$</span>pairs[[<span class="dv">2</span>]])
n_unique_seger <-<span class="st"> </span><span class="kw">length</span>(<span class="kw">unique</span>((corrected<span class="op">$</span>pairs[[<span class="dv">2</span>]][,<span class="dv">1</span>])))
n_unique_muraro <-<span class="st"> </span><span class="kw">length</span>(<span class="kw">unique</span>((corrected<span class="op">$</span>pairs[[<span class="dv">2</span>]][,<span class="dv">2</span>])))</code></pre></div>
<p>mnnCorrect found 2533 sets of mutual nearest-neighbours between <code>n_unique_seger</code> segerstolpe cells and <code>n_unique_muraro</code> muraro cells. This should be a sufficient number of pairs but the low number of unique cells in each dataset suggests we might not have captured the full biological signal in each dataset.</p>
<p><strong>Exercise</strong> Which cell-types had mnns across these datasets? Should we increase/decrease k?</p>
<p><strong>Answer</strong></p>
<p>Now we could create a combined dataset to jointly analyse these data. However, the corrected data is no longer counts and usually will contain negative expression values thus some analysis tools may no longer be appropriate. For simplicity let’s just plot a joint TSNE.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">require</span>(<span class="st">"Rtsne"</span>)</code></pre></div>
<pre><code>## Loading required package: Rtsne</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">joint_expression_matrix <-<span class="st"> </span><span class="kw">cbind</span>(corrected<span class="op">$</span>corrected[[<span class="dv">1</span>]], corrected<span class="op">$</span>corrected[[<span class="dv">2</span>]])
<span class="co"># Tsne will take some time to run on the full dataset</span>