[QUESTION] #140

saramoein372 · 2022-04-01T16:47:12Z

saramoein372
Apr 1, 2022

Hi Kelvin,

I have a question about the column "germline_dandelion" in the file "filtered_contig_igblast_db-pass_genotyped.tsv".

How this column is genrated?

Thank you,
Sara

zktuong · 2022-04-01T17:41:32Z

zktuong
Apr 1, 2022
Maintainer

It’s from immcantation’s CreateGermlines.py

0 replies

saramoein372 · 2022-04-01T18:48:37Z

saramoein372
Apr 1, 2022
Author

Thank you Kelvin. But what database is used for this purpose?
I am going to brows that database for answering some of the questions related to. my project.

Sorry if that is a basic question.

1 reply

zktuong Apr 1, 2022
Maintainer

The sequences are reconstructed from
IMGT database as per described at https://changeo.readthedocs.io/en/stable/examples/germlines.html

On the immcantation webpage, they also describe how the database is downloaded.

saramoein372 · 2022-04-05T18:20:45Z

saramoein372
Apr 5, 2022
Author

Hi Kelvin,

Still I have some difficulties to run TCR with dandelion and generate the network.

I could generate the file "filtered_contig_dandelion.tsv" , and could run:
vdj1 = ddl.read_10x_airr('/filtered_contig_dandelion.tsv')
adata = sc.read_10x_h5('sample_feature_bc_matrix.h5', gex_only=True)
ddl.tl.find_clones(vdj1, identity = 1, locus = 'tr')

But got error when generating the network:
ddl.tl.generate_network(vdj1)

Is there any command that I a missing when running this code? I need the TCR network.

Thank you,
Sara

0 replies

saramoein372 · 2022-04-05T18:23:43Z

saramoein372
Apr 5, 2022
Author

Related to my previous question, the error I get is this:

KeyError Traceback (most recent call last)
/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'clone_id'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
/var/folders/4m/kpwkqwb53wb9l5f6cv1ddfzc0000gn/T/ipykernel_6118/1841829037.py in
----> 1 ddl.tl.generate_network(vdj)
2
3

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/dandelion/tools/_network.py in generate_network(self, key, clone_key, min_size, downsample, verbose, **kwargs)
149 overlap = []
150 for i in out.metadata.index:
--> 151 if len(out.metadata.loc[i, str(clonekey)].split('|')) > 1:
152 overlap.append(
153 [c for c in out.metadata.loc[i, str(clonekey)].split('|')])

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/indexing.py in getitem(self, key)
923 with suppress(KeyError, IndexError):
924 return self.obj._get_value(*key, takeable=self._takeable)
--> 925 return self._getitem_tuple(key)
926 else:
927 # we by definition only have the 0th axis

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
1098 def _getitem_tuple(self, tup: tuple):
1099 with suppress(IndexingError):
-> 1100 return self._getitem_lowerdim(tup)
1101
1102 # no multi-index, so validate all of the indexers

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
860 return section
861 # This is an elided recursive call to iloc/loc
--> 862 return getattr(section, self.name)[new_key]
863
864 raise IndexingError("not applicable")

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/indexing.py in getitem(self, key)
929
930 maybe_callable = com.apply_if_callable(key, self.obj)
--> 931 return self._getitem_axis(maybe_callable, axis=axis)
932
933 def _is_scalar_access(self, key: tuple):

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1162 # fall thru to straight lookup
1163 self._validate_key(key, axis)
-> 1164 return self._get_label(key, axis=axis)
1165
1166 def _get_slice_axis(self, slice_obj: slice, axis: int):

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
1111 def _get_label(self, label, axis: int):
1112 # GH#5667 this will fail if the label is not present in the axis.
-> 1113 return self.obj.xs(label, axis=axis)
1114
1115 def _handle_lowerdim_multi_index_axis0(self, tup: tuple):

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
3774 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
3775 else:
-> 3776 loc = index.get_loc(key)
3777
3778 if isinstance(loc, np.ndarray):

/opt/anaconda3/envs/pathml/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'clone_id'

0 replies

saramoein372 · 2022-04-05T20:50:03Z

saramoein372
Apr 5, 2022
Author

Can it be the reason that the "clone_id" are unassigned?

1 reply

zktuong Apr 6, 2022
Maintainer

do you have the same issue after running ddl.tl.filter_contigs?

zktuong · 2022-04-06T13:00:17Z

zktuong
Apr 6, 2022
Maintainer

i just ran a complete workflow with the reannotated TCR from the tutorial and there's no issues. I suspect you have empty values in your 'clone_id' column in the vdj data table, which shouldn't be the case. There shouldn't be any unassigned clones if you ran ddl.tl.find_clones. It can happen if you have TCR and BCR data in the same object, but in which case the reccomendation is to not combine them.

3 replies

saramoein372 Apr 6, 2022
Author

Hi Kelvin,

Thank you. You wrote "It can happen if you have TCR and BCR data in the same object, but in which case the reccomendation is to not combine them."
Do you mean the Cell ranger results? Because my current cell ranger results contains both BCR and TCR. Should I run the cell ranger for TCR separately? What is you suggestion? Thanks.

zktuong Apr 6, 2022
Maintainer

yes run them separately, process the BCR and TCR separatedly

zktuong Apr 6, 2022
Maintainer

and i also mean within the dandelion architecture as well. don't combine TCR and BCR data into the same object.

saramoein372 · 2022-04-06T20:39:21Z

saramoein372
Apr 6, 2022
Author

Sure. Thank Kelvin.
One more question: I used my TCR inputs for running dandelion. It runs. But the output file name is: sample/dandelion/filtered_contig_igblast_db-pass.tsv
I used this command:
singularity run -B /athena/namlab/scratch/sam4032/HL9_s1_TCR /athena/namlab/scratch/sam4032/HL9_s1_TCR/sc-dandelion_latest.sif dandelion-preprocess --chain TR

I though the output name is "dandelion...tsv". Does that mean something is wrong?

1 reply

zktuong Apr 6, 2022
Maintainer

yup it's probably an older image. just delete all the .sif you currently have and pull again. you should also see that the actual dandelion version should be 0.2.0

saramoein372 · 2022-04-06T20:41:51Z

saramoein372
Apr 6, 2022
Author

You said "Yup", that means the "filtered_contig_igblast_db-pass.tsv" is correct name?

1 reply

zktuong Apr 6, 2022
Maintainer

no i meant that there was something wrong - the latest version should be *_dandelion.tsv

saramoein372 · 2022-04-06T20:45:02Z

saramoein372
Apr 6, 2022
Author

So let me ask you about my cell ranger then: my cell ranger result contains three folders; one for BCR, one for TCR and one for GEX.
Can this be the reason that I am not getting the correct results? It happens to me that some time I get *_dandelion.tsv and another time filtered_contig_igblast_db-pass.tsv
How this is possible?

1 reply

zktuong Apr 6, 2022
Maintainer

If the version of dandelion in your container/image isn’t 0.2.0, the it could be possible that they aren’t all called _dandelion.tsv. Can you check to see if that’s the case?

saramoein372 · 2022-04-06T20:50:21Z

saramoein372
Apr 6, 2022
Author

Can I ask what command I should use?

4 replies

zktuong Apr 6, 2022
Maintainer

if you just run the command you've been using, let it go for about 10-30 sec, one of the first couple of lines should look like:

Software versions:

Beginning preprocessing

command line parameters:
: 
--------------------------------------------------------------
    --meta = BCR_metadata1.csv
    --chain = ig
    --file_prefix = all
    --sep = -
    --flavour = strict
    --skip_format_header = False
    --filter_to_high_confidence = False
    --keep_trailing_hyphen_number = False
    --skip_reassign_dj = False
    --clean_output = False
--------------------------------------------------------------

dandelion==0.2.0 pandas==1.4.1 numpy==1.21.5 matplotlib==3.5.1 networkx==2.7.1 scipy==1.8.0 skbio==0.5.6

zktuong Apr 6, 2022
Maintainer

you can see at the bottom that it prints dandelion==0.2.0 for me

saramoein372 Apr 6, 2022
Author

Sure. it says:
eginning preprocessing

command line parameters:
:

--meta = None
--chain = tr
--file_prefix = filtered
--sep = _
--skip_format_header = False
--keep_trailing_hyphen_number = False
--clean_output = False

dandelion==0.1.12 pandas==1.3.4 numpy==1.20.3 matplotlib==3.4.3 networkx==2.6.3 scipy==1.7.1 skbio==0.5.6

zktuong Apr 6, 2022
Maintainer

so that tells you that your image is an older one 0.1.12. delete this .sif file and pull it again.

saramoein372 · 2022-04-07T12:58:39Z

saramoein372
Apr 7, 2022
Author

Thank you

0 replies

saramoein372 · 2022-10-11T07:26:20Z

saramoein372
Oct 11, 2022
Author

Thank you so much Kelvin. The question I should find the answer is: I have a network of clones, that one of the clone is so large. That means all of the cells have the same v_call and j_call and the aligned sequence is equal for all of them. *Question is: are these cells in this large clone germline ? or mutated?* For this purpose I am using the "aligned_sequence" and "aligned_germline" columns from dandelion file. Are these correct columns to find the answer to this question? And is it true that I calculated the lv distance of "aligned_sequence" and "aligned_germline" for those cells in the large clone to say if they are gremlin? or mutated? And if the distance is zero I say they are germline. Would you please help me to find the answer of this question? Thanks, Sara

…

On Fri, Apr 1, 2022 at 3:18 PM Zewen Kelvin Tuong ***@***.***> wrote: The sequences are reconstructed from IMGT database as per described at https://changeo.readthedocs.io/en/stable/examples/germlines.html On the immcantation webpage, they also describe how the database is downloaded. — Reply to this email directly, view it on GitHub <#140 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVVJONV7GJY7HOBHUENXD3TVC5DYVANCNFSM5SJQAYQA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

zktuong Oct 11, 2022
Maintainer

The above sounds correct. You are choosing the right columns.

saramoein372 · 2022-10-28T02:20:49Z

saramoein372
Oct 28, 2022
Author

Thank you so much Kelvin! I also have another question: do you have any free access BCR fastq file (R1 and R2) of the healthy control sample? Thank you, Sara

…

On Tue, Oct 11, 2022 at 5:44 AM Zewen Kelvin Tuong ***@***.***> wrote: The above sounds correct. You are choosing the right columns. — Reply to this email directly, view it on GitHub <#140 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVVJONR2O76JF4C2QLHMAKDWCUZIBANCNFSM5SJQAYQA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

saramoein372 · 2022-10-31T16:20:17Z

saramoein372
Oct 31, 2022
Author

I have another question: would you please provide some details about the main strategy for defining the BCR clones? Is is based on levenshtein distance on CDR3? Appreciate it. Thanks, Sara

…

On Thu, Oct 27, 2022 at 10:20 PM Sara Moien ***@***.***> wrote: Thank you so much Kelvin! I also have another question: do you have any free access BCR fastq file (R1 and R2) of the healthy control sample? Thank you, Sara On Tue, Oct 11, 2022 at 5:44 AM Zewen Kelvin Tuong < ***@***.***> wrote: > The above sounds correct. You are choosing the right columns. > > — > Reply to this email directly, view it on GitHub > <#140 (reply in thread)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AVVJONR2O76JF4C2QLHMAKDWCUZIBANCNFSM5SJQAYQA> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

1 reply

zktuong Oct 31, 2022
Maintainer

I have another question: would you please provide some details about the
main strategy for defining the BCR clones? Is is based on
levenshtein distance on CDR3?

Hi Sara, as stated in the documentation:

Clone definition is based on the following criterion:

I. Identical V- and J-gene usage in the VDJ chain (IGH/TRB/TRD).

II. Identical CDR3 junctional/CDR3 sequence length in the VDJ chain.

III. VDJ chain junctional/CDR3 sequences attains a minimum of % sequence similarity, based on hamming distance. The similarity cut-off is tunable (default is 85%; change to 100% if analyzing TCR data).

IV. VJ chain (IGK/IGL/TRA/TRG) usage. If cells within clones use different VJ chains, the clone will be splitted following the same conditions for VDJ chains in (1-3) as above.

So it's Hamming distance of the CDR3 amino acid sequence

saramoein372 · 2022-10-31T20:49:51Z

saramoein372
Oct 31, 2022
Author

Thank Kelvin! Also, wanted to ask if you have any available healthy control BCR fastq file, so that I download it and use it as negative control? I know that you have processed files for healthy donors in your Covid paper. But I need fastq files. I appreciate it. Thanks, Sara

…

On Mon, Oct 31, 2022 at 4:43 PM Zewen Kelvin Tuong ***@***.***> wrote: I have another question: would you please provide some details about the main strategy for defining the BCR clones? Is is based on levenshtein distance on CDR3? Hi Sara, as stated in the documentation <https://sc-dandelion.readthedocs.io/en/latest/notebooks/3_dandelion_findingclones-10x_data.html> : Clone definition is based on the following criterion: I. Identical V- and J-gene usage in the VDJ chain (IGH/TRB/TRD). II. Identical CDR3 junctional/CDR3 sequence length in the VDJ chain. III. VDJ chain junctional/CDR3 sequences attains a minimum of % sequence similarity, based on hamming distance. The similarity cut-off is tunable (default is 85%; change to 100% if analyzing TCR data). IV. VJ chain (IGK/IGL/TRA/TRG) usage. If cells within clones use different VJ chains, the clone will be splitted following the same conditions for VDJ chains in (1-3) as above. So it's *Hamming distance* of the CDR3 *amino acid sequence* — Reply to this email directly, view it on GitHub <#140 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVVJONWPVNYHYY74XOO6ZHTWGAVQLANCNFSM5SJQAYQA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

zktuong · 2022-10-31T20:55:35Z

zktuong
Oct 31, 2022
Maintainer

Also, wanted to ask if you have any available healthy control BCR fastq
file, so that I download it and use it as negative control?
I know that you have processed files for healthy donors in your Covid
paper. But I need fastq files.

Answered in #213 and the gist is that 1) i can't share it because of the data access requirements issues and 2) I'm not the custodian of the raw fastqs so I don't have to them anyway. Best is to approach the respective data access committees and ask for their permission to obtain the files.

0 replies

saramoein372 · 2022-11-01T21:01:04Z

saramoein372
Nov 1, 2022
Author

Thanks

…

On Mon, Oct 31, 2022 at 4:55 PM Zewen Kelvin Tuong ***@***.***> wrote: Also, wanted to ask if you have any available healthy control BCR fastq file, so that I download it and use it as negative control? I know that you have processed files for healthy donors in your Covid paper. But I need fastq files. Answered in #213 <#213> and the gist is that 1) i can't share it because of the data access requirements issues and 2) I'm not the custodian of the raw fastqs so I don't have to them anyway. Best is to approach the respective data access committees and ask for their permission to obtain the files. — Reply to this email directly, view it on GitHub <#140 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVVJONRW36AY2WIEOAO5NNLWGAW5DANCNFSM5SJQAYQA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

[QUESTION] #140

saramoein372 Apr 1, 2022

Replies: 18 comments · 14 replies

zktuong Apr 1, 2022 Maintainer

saramoein372 Apr 1, 2022 Author

zktuong Apr 1, 2022 Maintainer

saramoein372 Apr 5, 2022 Author

saramoein372 Apr 5, 2022 Author

Related to my previous question, the error I get is this:

saramoein372 Apr 5, 2022 Author

zktuong Apr 6, 2022 Maintainer

zktuong Apr 6, 2022 Maintainer

saramoein372 Apr 6, 2022 Author

zktuong Apr 6, 2022 Maintainer

zktuong Apr 6, 2022 Maintainer

saramoein372 Apr 6, 2022 Author

zktuong Apr 6, 2022 Maintainer

saramoein372 Apr 6, 2022 Author

zktuong Apr 6, 2022 Maintainer

saramoein372 Apr 6, 2022 Author

zktuong Apr 6, 2022 Maintainer

saramoein372 Apr 6, 2022 Author

zktuong Apr 6, 2022 Maintainer

zktuong Apr 6, 2022 Maintainer

saramoein372 Apr 6, 2022 Author

command line parameters: :

zktuong Apr 6, 2022 Maintainer

saramoein372 Apr 7, 2022 Author

saramoein372 Oct 11, 2022 Author

zktuong Oct 11, 2022 Maintainer

saramoein372 Oct 28, 2022 Author

saramoein372 Oct 31, 2022 Author

zktuong Oct 31, 2022 Maintainer

saramoein372 Oct 31, 2022 Author

zktuong Oct 31, 2022 Maintainer

saramoein372 Nov 1, 2022 Author

saramoein372
Apr 1, 2022

Replies: 18 comments 14 replies

zktuong
Apr 1, 2022
Maintainer

saramoein372
Apr 1, 2022
Author

zktuong Apr 1, 2022
Maintainer

saramoein372
Apr 5, 2022
Author

saramoein372
Apr 5, 2022
Author

saramoein372
Apr 5, 2022
Author

zktuong Apr 6, 2022
Maintainer

zktuong
Apr 6, 2022
Maintainer

saramoein372 Apr 6, 2022
Author

zktuong Apr 6, 2022
Maintainer

zktuong Apr 6, 2022
Maintainer

saramoein372
Apr 6, 2022
Author

zktuong Apr 6, 2022
Maintainer

saramoein372
Apr 6, 2022
Author

zktuong Apr 6, 2022
Maintainer

saramoein372
Apr 6, 2022
Author

zktuong Apr 6, 2022
Maintainer

saramoein372
Apr 6, 2022
Author

zktuong Apr 6, 2022
Maintainer

zktuong Apr 6, 2022
Maintainer

saramoein372 Apr 6, 2022
Author

command line parameters:
:

zktuong Apr 6, 2022
Maintainer

saramoein372
Apr 7, 2022
Author

saramoein372
Oct 11, 2022
Author

zktuong Oct 11, 2022
Maintainer

saramoein372
Oct 28, 2022
Author

saramoein372
Oct 31, 2022
Author

zktuong Oct 31, 2022
Maintainer

saramoein372
Oct 31, 2022
Author

zktuong
Oct 31, 2022
Maintainer

saramoein372
Nov 1, 2022
Author