BIONETS Hackathon repository

This is the Github project page for the BIONETS project / hackathon in the summer term 2024.

This wiki may eventually be made public.

How to contribute a project

Preparation

Pick next available project and assign it to yourself (see supplied list of references)
Read paper and update summary table with missing information
Find the code or software online
Check if tutorials exist by the authors and update summary sheet.
Check if tutorials exist by external authors
Find which data has been used by the authors and how it was preprocessed.
Update the data set information sheet with the relevant information.
If available, find the code used for the preprocessing of the data.
Find the settings the authors used in the publication to generate the figures.
Create a new subfolder using the tool name (all lower case, hyphenated) and implement the tool

Implementation:

Install software and create log file according to this file: https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-019-09406-4/MediaObjects/41467_2019_9406_MOESM1_ESM.pdf
Check if Minimal working example exists and run if available
Check if code exists for creating the figures in the article
Create script allowing the execution of the tool using all reference data sets.
Document method parameters, inputs and outputs.
Create Docker container
Supply yaml file for a conda environment.
Push code to github repo
Create markdown README in project folder
Wiki page with methodology, rationale, parameters, etc.
Update secondary evaluation criteria list

Downstream analyis:

Indentify a suitable downstream analysis which helps users identify relevant information in the network and apply it to your networks.
The input to the downstream analysis should be the output of the GRN tool (or a subset of the files, if not all are relevant)
Add your downstream analysis to the downstream analysis file.
Document input and output of your programs.

Reproduction

Attempt to replicate the examples/figures shown in the if not available using the data set(s) supplied in the study.
Make a script which can be run on multiple data sets, similar to the files in the Implementation section.

A note on the scripts.

Follow the supplied specifications regarding parameters, output folder strucutre, etc.
A script should allow you to execute a tool using one dataset with one parameter setting
If you want to test multiple parameter settings please create a wrapper script which calls the script with the relevant parameters.
If the tool is a commandline tool itself, it is not necessary to wrap the tool again.
If the tool/library is written in R, the script should be callable using R: Rscript tool-name.R -p p1 -q p2.

Troubleshooting:

If the installation fails, troubleshoot issues, double check if someone else is able to install it on their computer. (Especially with R, you sometimes need to install new system libraries, therefore you can use conda or docker, ...)
If the execution of the software fails, troubleshoot the issue (memory error) and try to fix it. Otherwise report issue.
If the running of the tool takes unreasonable time (e.g. >2h for a small example dataset) try running over night and report the run time.
If other problems occur, please document your issues.

README

For every tool there should be a README with

Brief description of the tool
Reference to the publication
Installation instructions, or relevant links to the instructions if there were no issues you encountered.
Copy-and-pastable execution instructions using example data.
Explanation of the relevant parameters
Input file format specification
Output file format specification
Explanation and interpretation of the output
Recommended hyperparameters by the authors
Hyperparameter recommendations for optimization (more instructions will follow)
Other necessary information

In general, the more difficult it was to install, execute or interpret the results of the tool, the more information needs to be supplied in the README.md file.

Input/Output Specifications

Below are the input and output specifications that every tool MUST use in the submitted script for the reference data. If you do not use these specifications, we will mark it as an error.

Input specifications

All tools must allow for the following inputs:

Input file 1: Path to a tab-separated file that contains the normalized gene expression for condition 1
- First column is named 'Gene' and contains the gene names
- All following columns are named after a sample/cell and contain the normalized gene expression for each respective gene for the given sample/cell
Input file 2: Path to a tab-separated file that contains the normalized gene expression for condition 2
- First column is named 'Gene' and contains the gene names
- All following columns are named after a sample/cell and contain the normalized gene expression for each respective gene for the given sample/cell
Output path: String of the output directory. The directory must exist prior to execution of the script!

Note:

If your tool requires additional inputs other than the ones listed above: Document what is needed and how you obtain it. If it's additional data dependent information, talk to us!

Output specifications

All tools must produce the following outputs in the given output path directory:

network.tsv: tab-separated file that contains all edges (row-wise) with the following columns:
- First column target: Target of the edge
- Second column regulator: Source of the edge
- Third column condition: Condition that the edge belongs to
- Fourth column weight: Weight of the edge

Note:

If your tool produces additional node weights: Store them into a second tab-separated file named nodes_weights.tsv with the following columns:
- First column id: Name of the node (must match the names in the network.tsv file)
- Second column weight: Weight of the node
If your tool produces additional information except for edge/node weights: Save them in another tab-separated file and document how you name them!
If your tool produces more than one weight per edge: Add them as fifth, sixth, ..., nth column and change the name of the weight columns to weight_1, weight_2, ..., weight_n

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
reference_datasets		reference_datasets
src		src
.Rprofile		.Rprofile
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
grn-benchmark.Rproj		grn-benchmark.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BIONETS Hackathon repository

How to contribute a project

Preparation

Implementation:

Downstream analyis:

Reproduction

A note on the scripts.

Troubleshooting:

README

Input/Output Specifications

Input specifications

Output specifications

About

Releases

Packages

Contributors 16

Languages

License

bionetslab/grn-benchmark

Folders and files

Latest commit

History

Repository files navigation

BIONETS Hackathon repository

How to contribute a project

Preparation

Implementation:

Downstream analyis:

Reproduction

A note on the scripts.

Troubleshooting:

README

Input/Output Specifications

Input specifications

Output specifications

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 16

Languages

Packages