Skip to content

Commit d8873dc

Browse files
committed
making separate repo for mouse lens with MQ
1 parent c2460c5 commit d8873dc

7 files changed

+65549
-0
lines changed

MQ_analysis.Rproj

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Version: 1.0
2+
3+
RestoreWorkspace: Default
4+
SaveWorkspace: Default
5+
AlwaysSaveHistory: Default
6+
7+
EnableCodeIndexing: Yes
8+
UseSpacesForTab: Yes
9+
NumSpacesForTab: 2
10+
Encoding: UTF-8
11+
12+
RnwWeave: knitr
13+
LaTeX: pdfLaTeX

MQ_data_prep_steps.txt

+90
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
proteinGroups.txt prep for mouse lens data
2+
3+
x Open Excel with blank sheet
4+
x Use menues or other mechanism in Excel to open proteinGroups.txt file
5+
x We want the Text Import Wizard to activate
6+
x Click "next" on first two steps (delimited and tab as separator)
7+
x In third step, make sure any columns with gene names are text format (7th column)
8+
x After import, save file as XLSX format (and name something more distinct)
9+
x Add a spacer column at the far right of the sheet (Col. FN) and fill with "blank"
10+
x Add a "flag" column (FO) to denote rejected proteins (decoys, contams, missing data)
11+
x Populate the "flag" column with "z" (later we will sort descending)
12+
x Add 4 extra rows at top of sheet (so headers are in Row 5)
13+
x Make header row "bold"
14+
x Allow header cells to wrap text (cell formatting)
15+
x Save sheet (save often)
16+
17+
x Select Row 5 and turn on column filters
18+
x Show "+" for Col FC ("Reverse") to show decoys
19+
x Replace "z" in Flag column (FO) with "decoy" and fill down
20+
x Reset column filter on FC
21+
x Show "+" for Col FD ("Potential contaminant") to show contaminants
22+
x Replace "z" in Flag column (FO) with "contam" and fill down
23+
x Reset column filter on FD
24+
x Show any rows with "keratin" in "Protein names" (Col F)
25+
x Flag any additional keratins as "contams" in Flag column
26+
x Look for "albumin" and also flag as contam
27+
x Clear filter on Col. F
28+
29+
x Add a "Counter" column (FP)
30+
x fill column with the number "1"
31+
x make sure the values stop at the end of the table (row 6364)
32+
x enter formula: "=SUBTOTAL(109,FP6:FP6501)" in cell FP4
33+
34+
x Sort descending on Flag column (FO)
35+
x Insert blank row at bottom of table between contams and decoys and other proteins
36+
37+
["mouse_lens_3TMTs_proteinGroups_int1.xlsx" is save at this point]
38+
39+
x Select Row 5 and turn off column filters
40+
x Locate the reporter ion columns of interest (CL:DO) and add sample key in Row 4 above
41+
x Do not use any "corrected" reporter ion columns
42+
x There are experiment-wide columns (Cols AD-AW)
43+
x There are columns for each experimental sample (cols BH-DO)
44+
x We want columns CL through DO
45+
x Add sample key above reporter channels (they are in increasing mass order)
46+
x The first 6 channels were used for the time point (in increasing order)
47+
x Note that N forms are lighter than C forms
48+
x An alphabetical ordering is not the same as a mass-based ordering
49+
50+
x We need to get columns for the same condition together for counting of missing data
51+
x We will need 4 columns for counters (one for each TMT experiment and a combined col.)
52+
x Copy the Exp. 1 data (CL:CQ) to columns FU:FZ
53+
x Copy the Exp. 2 data (CV:DA) to columns GA:GF
54+
x Copy the Exp. 3 data (DF:DK) to columns GG:GL
55+
x Replace the column names (row 5) with sample names
56+
x Label column FQ as "keeper"
57+
x Label column FR "missing_1"
58+
x Label column FS "missing_2"
59+
x Label column FT "missing_3"
60+
x Fill FQ with this formula: "=COUNTIF(FR6:FT6,"=ok")"
61+
x Fill FR with this formula: "=IF(COUNTIF(FU6:FZ6, ">1")>=5, "ok", "missing")"
62+
x Fill FS with this formula: "=IF(COUNTIF(GA6:GF6, ">1")>=5, "ok", "missing")"
63+
x Fill FT with this formula: "=IF(COUNTIF(GG6:GL6, ">1")>=5, "ok", "missing")"
64+
x In these discovery experiments, we want to cast a wide net
65+
x We want to allow a maximum of one missed reporter ion in each experiment
66+
67+
["mouse_lens_3TMTs_proteinGroups_int2.xlsx" is save at this point]
68+
69+
x Select Row 5 and turn column filters back on
70+
x Sort column FQ descending (we will keep rows with a value of "3")
71+
x Insert a blank row when count goes from "3" to "2" (rows 3873 and 3874)
72+
x Replace "z" in Flag column (FO) with "missing data" for rows 3875:6222
73+
74+
x Copy cells B5:B3873 (Majority protein IDs) to cells GM5:GM3873
75+
x Rename column header to "Accession" (cell GM5)
76+
x [Optional:
77+
x Convert text-to-columns as delimited with semicolon checked
78+
x Select a bunch of columns (eg. GN to HT) and clear contents]
79+
x Add a new tab
80+
x Rename tab to "MQ_prepped_data"
81+
x Copy cells FU5:GM3873 to new tab (paste in at cell A1)
82+
x Save workbook [a "mouse_lens_3TMTs_protenGroups_int3.xlsx" file was created here]
83+
x Go to "MQ_prepped_data" tab
84+
x Save As "MQ_prepped_data.csv" in CSV format (save just the selected tab)
85+
x Excel may give you several warnings
86+
x Quit Excel
87+
88+
Zero replacements and data normalizations will be done in R
89+
90+

0 commit comments

Comments
 (0)