Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example datasets for bep036 #465

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

Arshitha
Copy link

@Arshitha Arshitha commented Aug 29, 2024

Added pheno001 and pheno002 example dataset inspired by ds004215 on OpenNeuro but significantly modified to keep it simple and easy to convey the various use cases proposed in BEP036.

Use cases covered (and to be added to this PR):

  • pheno001 - Single session with both phenotype and imaging data
  • pheno002 - Two sessions with one imaging data only session
  • pheno003 - Two sessions with one phenotype data only session
  • pheno004 - Two sets of sessions. One set of sessions (e.g. screening, baseline, followup, etc) for phenotype data and another set of sessions (e.g. 01, 02, etc) for imaging data.

Still in draft state but would appreciate any and all feedback.

Pinging co-contributors: @ericearl @SamGuay @surchs

@Arshitha Arshitha marked this pull request as draft August 29, 2024 21:20
Copy link
Contributor

@ericearl ericearl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks! I'm guessing you left it in draft state becausew of pheno001 and pheno002, right?

I think we should remove the age_at_visit column/field from all phenotype/ measurement tools and instead provide a root-level sessions file with that field. Should we maybe take that a step farther and RECOMMEND or say it's OPTIONAL to add age to the sessions file?

@Arshitha
Copy link
Author

I like that idea. It's redundant information that can be aggregated to sessions level, and can be a recommendation in the BEP.

@Arshitha
Copy link
Author

It's in Draft state because I haven't prepared pheno003 and pheno004 but yes, all four example datasets will violate the contribution guidelines.

@christinerogers
Copy link

christinerogers commented Oct 24, 2024

Got a question from @dominikwelke --

Could this PR include an example showing how to represent multiple runs from one participant-session?

@ericearl mentioned today this is easily done by adding a run column in the .tsv, would be nice to see illustrated and mentioned here.

- All participants.tsv files have been simplified.
- pheno004 has become instead an example of some imaging-only, some phenotype-only, and some with both data
@ericearl
Copy link
Contributor

ericearl commented Feb 6, 2025

I hijacked the not yet created pheno004 for use with some current bids-validator testing needs at bids-standard/bids-specification#2044. Can we make this a non-draft PR and get it merged hopefully? @effigies @Arshitha @SamGuay @surchs

@effigies
Copy link
Contributor

effigies commented Feb 6, 2025

Please set the BIDS_SCHEMA environment variable to https://bids-specification--2044.org.readthedocs.build/en/2044/schema.json here:

run: echo BIDS_SCHEMA=https://bids-specification.readthedocs.io/en/latest/schema.json >> $GITHUB_ENV

Please also add pheno004 to be skipped on legacy and stable:

- name: Skip legacy validation for post-legacy datasets
run: for DS in mrs_* dwi_deriv; do touch $DS/.SKIP_VALIDATION; done
if: matrix.bids-validator == 'legacy'
shell: bash

- name: Skip stable validation for datasets with unreleased features
run: for DS in dwi_deriv; do touch $DS/.SKIP_VALIDATION; done
if: matrix.bids-validator != 'dev'
shell: bash

@ericearl
Copy link
Contributor

ericearl commented Feb 6, 2025

@effigies Is that comment just above here a note for me? I'm confused by most of it and don't feel safe editing those files as-is. If you need me to take care of that, can I sit with you, Ross, or Nell to figure it out or have it explained to me enough to be able to do the work?

@effigies
Copy link
Contributor

effigies commented Feb 6, 2025

Okay, I did what I asked. It looks like there are issues in the schema that need to be addressed, but also there are unrelated issues in pheno001-003: https://github.com/bids-standard/bids-examples/actions/runs/13188395001/job/36815880378?pr=465

@ericearl
Copy link
Contributor

ericearl commented Feb 7, 2025

This is super-helpful @effigies, thank you! I'm bringing the errors out of the logs here for us (@Arshitha @SamGuay @surchs):

# pheno001

	[ERROR] MISSING_DATASET_DESCRIPTION A dataset_description.json file is required in the root of the dataset
		

	Please visit https://neurostars.org/search?q=MISSING_DATASET_DESCRIPTION for existing conversations about this issue.

	[ERROR] JSON_INVALID Not a valid JSON file.
		/sub-01/anat/sub-01_T1w.json
		/sub-01/anat/sub-01_T1w.json

		2 more files with the same issue

	Please visit https://neurostars.org/search?q=JSON_INVALID for existing conversations about this issue.

# pheno002

	[ERROR] MISSING_DATASET_DESCRIPTION A dataset_description.json file is required in the root of the dataset
		

	Please visit https://neurostars.org/search?q=MISSING_DATASET_DESCRIPTION for existing conversations about this issue.

	[ERROR] TSV_COLUMN_ORDER_INCORRECT Some TSV columns are in the incorrect order
		session_id
		/sessions.tsv - Column 0 (starting from 0) found at index 1.

	Please visit https://neurostars.org/search?q=TSV_COLUMN_ORDER_INCORRECT for existing conversations about this issue.

	[ERROR] TSV_INDEX_VALUE_NOT_UNIQUE An index column(s) was specified for the tsv file and not all of the values for it are unique.
		/sessions.tsv - Row: 4, Value: 01
		/sessions.tsv - Row: 5, Value: 02

	Please visit https://neurostars.org/search?q=TSV_INDEX_VALUE_NOT_UNIQUE for existing conversations about this issue.

	[ERROR] JSON_INVALID Not a valid JSON file.
		/sub-01/ses-01/anat/sub-01_ses-01_T1w.json
		/sub-01/ses-01/anat/sub-01_ses-01_T1w.json

		6 more files with the same issue

	Please visit https://neurostars.org/search?q=JSON_INVALID for existing conversations about this issue.

# pheno003

	[ERROR] MISSING_DATASET_DESCRIPTION A dataset_description.json file is required in the root of the dataset
		

	Please visit https://neurostars.org/search?q=MISSING_DATASET_DESCRIPTION for existing conversations about this issue.

	[ERROR] TSV_COLUMN_ORDER_INCORRECT Some TSV columns are in the incorrect order
		session_id
		/sessions.tsv - Column 0 (starting from 0) found at index 1.

	Please visit https://neurostars.org/search?q=TSV_COLUMN_ORDER_INCORRECT for existing conversations about this issue.

	[ERROR] TSV_INDEX_VALUE_NOT_UNIQUE An index column(s) was specified for the tsv file and not all of the values for it are unique.
		/sessions.tsv - Row: 4, Value: baseline

	Please visit https://neurostars.org/search?q=TSV_INDEX_VALUE_NOT_UNIQUE for existing conversations about this issue.

	[ERROR] JSON_INVALID Not a valid JSON file.
		/sub-01/ses-baseline/anat/sub-01_ses-baseline_T1w.json
		/sub-01/ses-baseline/anat/sub-01_ses-baseline_T1w.json

		2 more files with the same issue

	Please visit https://neurostars.org/search?q=JSON_INVALID for existing conversations about this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants