This is the Github project page for the BIONETS project / hackathon in the summer term 2024.
This wiki may eventually be made public.
- Pick next available project and assign it to yourself (see supplied list of references)
- Read paper and update summary table with missing information
- Find the code or software online
- Check if tutorials exist by the authors and update summary sheet.
- Check if tutorials exist by external authors
- Find which data has been used by the authors and how it was preprocessed.
- Update the data set information sheet with the relevant information.
- If available, find the code used for the preprocessing of the data.
- Find the settings the authors used in the publication to generate the figures.
- Create a new subfolder using the tool name (all lower case, hyphenated) and implement the tool
- Install software and create log file according to this file: https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-019-09406-4/MediaObjects/41467_2019_9406_MOESM1_ESM.pdf
- Check if Minimal working example exists and run if available
- Check if code exists for creating the figures in the article
- Create script allowing the execution of the tool using all reference data sets.
- Document method parameters, inputs and outputs.
- Create Docker container
- Supply yaml file for a conda environment.
- Push code to github repo
- Create markdown README in project folder
- Wiki page with methodology, rationale, parameters, etc.
- Update secondary evaluation criteria list
- Indentify a suitable downstream analysis which helps users identify relevant information in the network and apply it to your networks.
- The input to the downstream analysis should be the output of the GRN tool (or a subset of the files, if not all are relevant)
- Add your downstream analysis to the downstream analysis file.
- Document input and output of your programs.
- Attempt to replicate the examples/figures shown in the if not available using the data set(s) supplied in the study.
- Make a script which can be run on multiple data sets, similar to the files in the Implementation section.
- Follow the supplied specifications regarding parameters, output folder strucutre, etc.
- A script should allow you to execute a tool using one dataset with one parameter setting
- If you want to test multiple parameter settings please create a wrapper script which calls the script with the relevant parameters.
- If the tool is a commandline tool itself, it is not necessary to wrap the tool again.
- If the tool/library is written in R, the script should be callable using R:
Rscript tool-name.R -p p1 -q p2
.
- If the installation fails, troubleshoot issues, double check if someone else is able to install it on their computer. (Especially with R, you sometimes need to install new system libraries, therefore you can use conda or docker, ...)
- If the execution of the software fails, troubleshoot the issue (memory error) and try to fix it. Otherwise report issue.
- If the running of the tool takes unreasonable time (e.g. >2h for a small example dataset) try running over night and report the run time.
- If other problems occur, please document your issues.
For every tool there should be a README with
- Brief description of the tool
- Reference to the publication
- Installation instructions, or relevant links to the instructions if there were no issues you encountered.
- Copy-and-pastable execution instructions using example data.
- Explanation of the relevant parameters
- Input file format specification
- Output file format specification
- Explanation and interpretation of the output
- Recommended hyperparameters by the authors
- Hyperparameter recommendations for optimization (more instructions will follow)
- Other necessary information
In general, the more difficult it was to install, execute or interpret the results of the tool, the more information needs to be supplied in the README.md file.
Below are the input and output specifications that every tool MUST use in the submitted script for the reference data. If you do not use these specifications, we will mark it as an error.
All tools must allow for the following inputs:
- Input file 1: Path to a tab-separated file that contains the normalized gene expression for condition 1
- First column is named 'Gene' and contains the gene names
- All following columns are named after a sample/cell and contain the normalized gene expression for each respective gene for the given sample/cell
- Input file 2: Path to a tab-separated file that contains the normalized gene expression for condition 2
- First column is named 'Gene' and contains the gene names
- All following columns are named after a sample/cell and contain the normalized gene expression for each respective gene for the given sample/cell
- Output path: String of the output directory. The directory must exist prior to execution of the script!
Note:
- If your tool requires additional inputs other than the ones listed above: Document what is needed and how you obtain it. If it's additional data dependent information, talk to us!
All tools must produce the following outputs in the given output path directory:
network.tsv
: tab-separated file that contains all edges (row-wise) with the following columns:- First column
target
: Target of the edge - Second column
regulator
: Source of the edge - Third column
condition
: Condition that the edge belongs to - Fourth column
weight
: Weight of the edge
- First column
Note:
- If your tool produces additional node weights: Store them into a second tab-separated file named
nodes_weights.tsv
with the following columns:- First column
id
: Name of the node (must match the names in thenetwork.tsv
file) - Second column
weight
: Weight of the node
- First column
- If your tool produces additional information except for edge/node weights: Save them in another tab-separated file and document how you name them!
- If your tool produces more than one weight per edge: Add them as fifth, sixth, ..., nth column and change the name of the weight columns to weight_1, weight_2, ..., weight_n