Name	Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github/workflows	.github/workflows
conf	conf
files	files
lib	lib
modules/local	modules/local
tests	tests
workflows	workflows
.gitignore	.gitignore
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
cloudgene.pgs.yaml	cloudgene.pgs.yaml
cloudgene.yaml	cloudgene.yaml
environment.yml	environment.yml
main.nf	main.nf
nextflow.config	nextflow.config
nf-test.config	nf-test.config

nf-imputationserver

This repository includes the Michigan Imputation Server Workflow ported to Nextflow.

Run with test data

The pipeline provides small test data to verify installation:

nextflow run main.nf -c conf/test_single_vcf.config

Run with custom configuration

job.config:

params {
    project                 = "my-test-project"
    build                   = "hg19"
    files                   = "tests/input/three/*.vcf.gz"
    population              = "eur"
    mode                    = "imputation"
    refpanel_yaml           = "tests/hapmap-2/2.0.0/imputation-hapmap2.yaml"
    output                  = "output"
}

Run pipeline with job.config configuration:

nextflow run main.nf -c job.config

Parameters

Parameter	Default Value	Description
`project`	`null`	Project name
`project_date`	`date`	Project date
`files`	`null`	List of input files
`population`	`null`	Population information
`refpanel_yaml`	`null`	Reference panel YAML file
`mode`	`imputation`	Processing mode (e.g., 'imputation' or `qc-only``)
`phasing`	`eagle`	Phasing method (e.g., 'eagle' or `beagle`)
`minimac_window`	`500000`	Minimac window size
`minimac_min_ratio`	`0.00001`	Minimac minimum ratio
`chunksize`	`20000000`	Chunk size for processing
`phasing_window`	`5000000`	Phasing window size
`cpus`	`1`	Number of CPUs to use
`min_samples`	`20`	Minimum number of samples needed
`max_samples`	`50000`	Maximum number of samples allowed
`imputation.enabled`	`true`	Enable or disable imputation
`ancestry.enabled`	`false`	Enable or disable ancestry analysis
`ancestry.dim`	`10`	Ancestry analysis dimension
`ancestry.dim_high`	`20`	High dimension for ancestry analysis
`ancestry.batch_size`	`50`	Batch size for ancestry analysis
`ancestry.reference`	`null`	Ancestry reference data
`ancestry.max_pcs`	`8`	Maximum principal components for ancestry
`ancestry.k`	`10`	K value for ancestry analysis
`ancestry.threshold`	`0.75`	Ancestry threshold
`r2Filter`	`0`	R2 filter value
`password`	`null`	Password for encryption
`config.send_mail`	`false`	Enable or disable email notifications
`user.name`	`null`	User's name
`user.email`	`null`	User's email
`service.name`	`nf-imputationserver`	Service name
`service.email`	`null`	Service email
`service.url`	`null`	Service URL

Reference Panel Configuration

This document describes the structure of a YAML file used to configure a reference panel for the Michigan Imputation Server. Reference panels are essential for genotype imputation, allowing the server to infer missing genotype data accurately.

YAML Structure

Field	Description
`name`	The name of the reference panel.
`description`	A brief description of the reference panel.
`version`	The version of the reference panel.
`website`	The website where more information about the panel can be found.
`category`	The category to which the reference panel belongs. TODO: has to be RefPanel
`properties`	A section containing specific properties of the reference panel.

Properties

The properties section contains the following key-value pairs:

Property	Description	Required
`id`	An identifier for the reference panel. TODO: needed??	yes
`genotypes`	The location of the genotype files for the reference panel data.	yes
`legend`	The location of the legend files for the reference panel data.	yes
`mapEagle`	The location of the genetic map file used for phasing with eagle.	yes
`refEagle`	The location of the BCF file for the reference panel data for eagle.	yes
`mapBeagle`	The location of the genetic map file used for phasing with Beagle.	no
`refBeagle`	The location of the BCF file for the reference panel data for Beagle.	no
`build`	The genome build version used for the reference panel (e.g., hg19 or hg38).	yes
`range`	Specify a range that is used for imputation (e.g. HLA)	no
`mapMinimac`	The location of the map file for Minimac	no
`populations`	A dictionary mapping population identifiers to their names.	yes
`qcFilter`	A dictionary mapping quality filters to their values.	no

Populations

The populations section contains a dictionary mapping population identifiers to their names and sample size. This mapping helps categorize and label the populations represented in the reference panel.

Identifier	Name
`id`	The id of the popualtion (e.g. eur)
`name`	The label of the population. (e.g. EUR)
`samples`	Number of samples in the reference panel

Note: the population id has to be the same as in the legend files.

Quality Filters

Filter	Name	Default
`overlap`	Minimal overlap between gwas data and reference panel	0.5
`minSnps`	Minimal #SNPs per chunk	3
`sampleCallrate`	Minimal sample call rate	0.5
`mixedGenotypeschrX`	-	0.1
`strandFlips`	Maximal allowed strand flips	100

Example YAML

Here's an example YAML configuration for a reference panel. This configuration describes a reference panel named "HapMap 2" for the Michigan Imputation Server, including details about its version, data sources, and populations represented. The files are stored on AWS S3 and are directly consumed by the pipeline from there.

name: HapMap 2
description: HapMap2 Reference Panel for Michigan Imputation Server
version: 2.0.0
website: http://imputationserver.sph.umich.edu
category: RefPanel

properties:
  id: hapmap2
  genotypes: s3://cloudgene/refpanels/hapmap/m3vcfs/hapmap_r22.chr$chr.CEU.hg19.recode.m3vcf.gz
  legend: s3://cloudgene/refpanels/hapmap/legends/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz
  mapEagle: s3://cloudgene/refpanels/hapmap/map/genetic_map_hg19_withX.txt.gz
  refEagle: s3://cloudgene/refpanels/hapmap/bcfs/hapmap_r22.chr$chr.CEU.hg19.recode.bcf
  build: hg19
  populations:
    - id: eur
      name: EUR
      samples: 60
    - id: mixed
      name: Mixed
      samples: -1

Note on `$chr` Variable

In the example YAML configuration provided, you may have noticed the presence of the $chr variable in some URLs. This variable is a placeholder for the chromosome number and will be replaced by the Nextflow pipeline.

Legend Files

A legend file is a tab-delimited file consisting of 5 columns (id, position, a0, a1, all.aaf).

Run with Cloudgene

Requirements:

Install Nextflow
Docker or Singularity
Java 14

Installation

Install cloudgene3: curl -s install.cloudgene.io | bash -s 3.0.0-beta4
Download latest source code zip file from releases
Install nf-impuationserver app: ./cloudgene install nf-imputationserver-2.0.0-beta1.zip
Install hapmap2 referenece panel: ./cloudgene install https://genepi.i-med.ac.at/downloads/imputation/imputation-hapmap2.zip
Start cloudgene server: ./cloudgene server
Open http://localhost:8082
Login with default admin account: username admin and password admin1978
Imputation can be tested with the following test file

Default Configuration

The default configuration runs with Docker and uses Nextflow's local executor.

Running on SLURM

Configure via web interface (Applications -> imputationserver -> Settings) or adapt/create file apps/imputationserver/nextflow.config and add the following:

process {
  executor = 'slurm'
  queue = 'QueueName'  // replace with your Queue name
}

errorStrategy = {task.exitStatus == 143 ? 'retry' : 'terminate'}
maxErrors = '-1'
maxRetries = 3

See more about SLURM Nextflow Documentation.

Running on AWS Batch

Create AWS Batch queue and AMI role (see Nextflow Documentation)
Configure via web interface (Applications -> imputationserver -> Settings) or adapt/create file apps/imputationserver/nextflow.config and add the following:

aws {
  region = 'eu-central-1'
  client {
    uploadChunkSize = 10485760
  }
  batch {
    cliPath = '/home/ec2-user/miniconda/bin/aws'
    executionRole = 'arn:aws:iam::***' // replace with your AMI role
  }
}

process {
  executor = 'awsbatch'
  queue = 'QueueName'  // replace with your Queue name
  scratch = false
}

Got to Settings -> General and set Workspace to "S3" and enter the location of a subfolder in an S3 bucket. Enter the location of a subfolder in an S3 bucket. Currently, it must be a subfolder; a bucket won't work (Example: s3://cloudgene/workspace).

Optional add Wave and Fusion support to improve performance:

wave {
  enabled = true
  endpoint = 'https://wave.seqera.io'
}

fusion {
  enabled = true
}

Activate mail support

Configure mail server in Settings -> General -> Mail
Configure Nextflow to use Cloudgenes mail settings by add the following to the global configuration (Settings -> General -> Nextflow) or adapt/create files config/nextflow.confing (see Nextflow Documention for all available mail settings)

mail {
    smtp.host = "${CLOUDGENE_SMTP_HOST}"
    smtp.port = "${CLOUDGENE_SMTP_PORT}"
    smtp.user = "${CLOUDGENE_SMTP_USER}"
    smtp.password = "${CLOUDGENE_SMTP_PASSWORD}"
    smtp.auth = true
    smtp.starttls.enable = true
    smtp.ssl.protocols = 'TLSv1.2'
}

Add params.config.send_mail = true to the application specific configuration to activate mail notifications in the nf-imputationserver pipeline

Adapt default parameters

Parameters can be changed in the `nextflow.config`` file of the application. Example:

params.chunk_size = 500000
params.minimac_window = 100000

Development

Build docker image locally

docker build -t genepi/imputation-docker:latest .

Run testcases

nf-test test

License

nf-imputationserver is MIT Licensed and was developed at the Institute of Genetic Epidemiology, Medical University of Innsbruck, Austria.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nf-imputationserver

Run with test data

Run with custom configuration

Parameters

Reference Panel Configuration

YAML Structure

Properties

Populations

Quality Filters

Example YAML

Note on `$chr` Variable

Legend Files

Run with Cloudgene

Requirements:

Installation

Default Configuration

Running on SLURM

Running on AWS Batch

Activate mail support

Adapt default parameters

Development

Build docker image locally

Run testcases

License

Contact

About

Releases 20

Packages

Contributors 2

Languages

License

genepi/imputationserver2

Folders and files

Latest commit

History

Repository files navigation

nf-imputationserver

Run with test data

Run with custom configuration

Parameters

Reference Panel Configuration

YAML Structure

Properties

Populations

Quality Filters

Example YAML

Note on $chr Variable

Legend Files

Run with Cloudgene

Requirements:

Installation

Default Configuration

Running on SLURM

Running on AWS Batch

Activate mail support

Adapt default parameters

Development

Build docker image locally

Run testcases

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases 20

Packages 0

Contributors 2

Languages

Note on `$chr` Variable

Packages