|
| 1 | +proteinGroups.txt prep for mouse lens data |
| 2 | + |
| 3 | +x Open Excel with blank sheet |
| 4 | +x Use menues or other mechanism in Excel to open proteinGroups.txt file |
| 5 | +x We want the Text Import Wizard to activate |
| 6 | +x Click "next" on first two steps (delimited and tab as separator) |
| 7 | +x In third step, make sure any columns with gene names are text format (7th column) |
| 8 | +x After import, save file as XLSX format (and name something more distinct) |
| 9 | +x Add a spacer column at the far right of the sheet (Col. FN) and fill with "blank" |
| 10 | +x Add a "flag" column (FO) to denote rejected proteins (decoys, contams, missing data) |
| 11 | +x Populate the "flag" column with "z" (later we will sort descending) |
| 12 | +x Add 4 extra rows at top of sheet (so headers are in Row 5) |
| 13 | +x Make header row "bold" |
| 14 | +x Allow header cells to wrap text (cell formatting) |
| 15 | +x Save sheet (save often) |
| 16 | + |
| 17 | +x Select Row 5 and turn on column filters |
| 18 | +x Show "+" for Col FC ("Reverse") to show decoys |
| 19 | +x Replace "z" in Flag column (FO) with "decoy" and fill down |
| 20 | +x Reset column filter on FC |
| 21 | +x Show "+" for Col FD ("Potential contaminant") to show contaminants |
| 22 | +x Replace "z" in Flag column (FO) with "contam" and fill down |
| 23 | +x Reset column filter on FD |
| 24 | +x Show any rows with "keratin" in "Protein names" (Col F) |
| 25 | +x Flag any additional keratins as "contams" in Flag column |
| 26 | +x Look for "albumin" and also flag as contam |
| 27 | +x Clear filter on Col. F |
| 28 | + |
| 29 | +x Add a "Counter" column (FP) |
| 30 | +x fill column with the number "1" |
| 31 | +x make sure the values stop at the end of the table (row 6364) |
| 32 | +x enter formula: "=SUBTOTAL(109,FP6:FP6501)" in cell FP4 |
| 33 | + |
| 34 | +x Sort descending on Flag column (FO) |
| 35 | +x Insert blank row at bottom of table between contams and decoys and other proteins |
| 36 | + |
| 37 | +["mouse_lens_3TMTs_proteinGroups_int1.xlsx" is save at this point] |
| 38 | + |
| 39 | +x Select Row 5 and turn off column filters |
| 40 | +x Locate the reporter ion columns of interest (CL:DO) and add sample key in Row 4 above |
| 41 | +x Do not use any "corrected" reporter ion columns |
| 42 | +x There are experiment-wide columns (Cols AD-AW) |
| 43 | +x There are columns for each experimental sample (cols BH-DO) |
| 44 | +x We want columns CL through DO |
| 45 | +x Add sample key above reporter channels (they are in increasing mass order) |
| 46 | +x The first 6 channels were used for the time point (in increasing order) |
| 47 | +x Note that N forms are lighter than C forms |
| 48 | +x An alphabetical ordering is not the same as a mass-based ordering |
| 49 | + |
| 50 | +x We need to get columns for the same condition together for counting of missing data |
| 51 | +x We will need 4 columns for counters (one for each TMT experiment and a combined col.) |
| 52 | +x Copy the Exp. 1 data (CL:CQ) to columns FU:FZ |
| 53 | +x Copy the Exp. 2 data (CV:DA) to columns GA:GF |
| 54 | +x Copy the Exp. 3 data (DF:DK) to columns GG:GL |
| 55 | +x Replace the column names (row 5) with sample names |
| 56 | +x Label column FQ as "keeper" |
| 57 | +x Label column FR "missing_1" |
| 58 | +x Label column FS "missing_2" |
| 59 | +x Label column FT "missing_3" |
| 60 | +x Fill FQ with this formula: "=COUNTIF(FR6:FT6,"=ok")" |
| 61 | +x Fill FR with this formula: "=IF(COUNTIF(FU6:FZ6, ">1")>=5, "ok", "missing")" |
| 62 | +x Fill FS with this formula: "=IF(COUNTIF(GA6:GF6, ">1")>=5, "ok", "missing")" |
| 63 | +x Fill FT with this formula: "=IF(COUNTIF(GG6:GL6, ">1")>=5, "ok", "missing")" |
| 64 | +x In these discovery experiments, we want to cast a wide net |
| 65 | +x We want to allow a maximum of one missed reporter ion in each experiment |
| 66 | + |
| 67 | +["mouse_lens_3TMTs_proteinGroups_int2.xlsx" is save at this point] |
| 68 | + |
| 69 | +x Select Row 5 and turn column filters back on |
| 70 | +x Sort column FQ descending (we will keep rows with a value of "3") |
| 71 | +x Insert a blank row when count goes from "3" to "2" (rows 3873 and 3874) |
| 72 | +x Replace "z" in Flag column (FO) with "missing data" for rows 3875:6222 |
| 73 | + |
| 74 | +x Copy cells B5:B3873 (Majority protein IDs) to cells GM5:GM3873 |
| 75 | +x Rename column header to "Accession" (cell GM5) |
| 76 | +x [Optional: |
| 77 | +x Convert text-to-columns as delimited with semicolon checked |
| 78 | +x Select a bunch of columns (eg. GN to HT) and clear contents] |
| 79 | +x Add a new tab |
| 80 | +x Rename tab to "MQ_prepped_data" |
| 81 | +x Copy cells FU5:GM3873 to new tab (paste in at cell A1) |
| 82 | +x Save workbook [a "mouse_lens_3TMTs_protenGroups_int3.xlsx" file was created here] |
| 83 | +x Go to "MQ_prepped_data" tab |
| 84 | +x Save As "MQ_prepped_data.csv" in CSV format (save just the selected tab) |
| 85 | +x Excel may give you several warnings |
| 86 | +x Quit Excel |
| 87 | + |
| 88 | +Zero replacements and data normalizations will be done in R |
| 89 | + |
| 90 | + |
0 commit comments