Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of results #3

Open
pacificma opened this issue Apr 19, 2022 · 6 comments
Open

Interpretation of results #3

pacificma opened this issue Apr 19, 2022 · 6 comments
Assignees

Comments

@pacificma
Copy link

Dear Authors,

I am running COSMO for data sets with protein and rna. Final results table showing one of the clinical entry was '-1'. I wonder how this could be interpreted(not any clinical profile matched to this sample? I guess this situation would happen more frequently when the number of categories included in the analysis is increasing?) and what criterion the method used to have this summarized in clinical table.

Best, Weiping

@pacificma
Copy link
Author

For example we only consider gender as the clinical predictor (binary)

1.If we only have one sample with clinical (-1) , it means the estimated gender is different from what it is labeled. but all the other samples look good on this.

2.If we have multiple samples with clinical (-1), it means that those gender estimates were different from the labels but algorithm can not infer if they are swapped from the combination of clincial and omics data?

3.if clinical was labeled with the other number in one sample, means the omics data and clinical were swapped at the same time?

Am I correct on those scenarios?

Another question, if we using more than 2 omics data to check, I guess we could only use 2 at a time and run multiple times. Is there any systematic way to aggregate those results from multiple omics data?

@pacificma
Copy link
Author

I have another question:
If we see some results as illustrated in the demo

sample Clinical Data1 Data2
Testing_8 8 8 8
Testing_9 9 8 9

for the mismatched sample in the second line, does that mean data1 of the 8th sample matches to the data2 and clinical of 9th sample , or the data1 of 9th sample matches to the other data of 8th sample?
My take is the later , is that correct?

@soonjye
Copy link
Collaborator

soonjye commented Apr 26, 2022

For example we only consider gender as the clinical predictor (binary)

1.If we only have one sample with clinical (-1) , it means the estimated gender is different from what it is labeled. but all the other samples look good on this.

2.If we have multiple samples with clinical (-1), it means that those gender estimates were different from the labels but algorithm can not infer if they are swapped from the combination of clincial and omics data?

3.if clinical was labeled with the other number in one sample, means the omics data and clinical were swapped at the same time?

Am I correct on those scenarios?

Another question, if we using more than 2 omics data to check, I guess we could only use 2 at a time and run multiple times. Is there any systematic way to aggregate those results from multiple omics data?

Yes, you are correct for all three conditions.
Right now, COSMO is able to run on only 2 omics data. It does not aggregate results from different pairs of omics data.

@soonjye
Copy link
Collaborator

soonjye commented Apr 26, 2022

I have another question: If we see some results as illustrated in the demo

sample Clinical Data1 Data2
Testing_8 8 8 8
Testing_9 9 8 9
for the mismatched sample in the second line, does that mean data1 of the 8th sample matches to the data2 and clinical of 9th sample , or the data1 of 9th sample matches to the other data of 8th sample? My take is the later , is that correct?

The later is correct.
Looking at the table, it could be a duplication: where Data1 of Sample8th is duplicated, and replaced Data1 of Sample9th.

@pacificma
Copy link
Author

Thank you!

I believe the mislabeling results of omics data was provided by method 1 only. And when I am looking at the final result table, the best match was not the same as the table provided from method 1. I wonder did you apply any additional adjustment from the result of method 1?

Final results table

sample Clinical Data1 Data2
#22 22 22 46
#71 71 25 71
#72 72 72 46

Method 1 table

</style>
d1 d1_label d2 d2_label d1rank d2rank distance correlation
22 #22 72 #72 69 65 134 -0.209155948
25 #25 71 #71 16 11 27 0.162347354
71 #71 25 #25 2 1 3 0.780665347
72 #72 22 #22 1 3 4 0.218780229

@soonjye
Copy link
Collaborator

soonjye commented May 8, 2022

Thank you!

I believe the mislabeling results of omics data was provided by method 1 only. And when I am looking at the final result table, the best match was not the same as the table provided from method 1. I wonder did you apply any additional adjustment from the result of method 1?

Final results table

sample Clinical Data1 Data2
#22 22 22 46
#71 71 25 71
#72 72 72 46
Method 1 table

</style> d1 d1_label d2 d2_label d1rank d2rank distance correlation 22 #22 72 #72 69 65 134 -0.209155948 25 #25 71 #71 16 11 27 0.162347354 71 #71 25 #25 2 1 3 0.780665347 72 #72 22 #22 1 3 4 0.218780229

Right. There are two methods in the algorithm, each from different winning teams.
The final result table utilized predictions from both winning teams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants