Skip to content

Commit 43af3e2

Browse files
committed
更新了使用说明
1 parent 124f7e5 commit 43af3e2

File tree

2 files changed

+104
-50
lines changed

2 files changed

+104
-50
lines changed

Lipidomics/README.md

+36-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,39 @@
11
# Introduction
22

3+
### What do I wanna perform?
4+
5+
As a pipeline, I plan to orgainze the workshop directory like this:
6+
7+
```
8+
./ #output directory
9+
|-- pip.work.sh # A script contained all function set to run. A MAIN SWITCH.
10+
|-- Shell # A directory contains all of scripts organized under steps order.
11+
|-- Results # A directory contains all of the results (exactly, including the intermediate results)
12+
```
13+
14+
And all the users need to do is preparing a config file and write them into a script to build the workshop above.
15+
Here is an example:
16+
17+
`/ifs1/ST_MD/PMO/F13ZOOYJSY1389_T2D_MultiOmics/Users/fangchao/lipidomics.20151118/pip.config.sh`
18+
19+
For a better understanding of the pipeline's logic, a tree following shows you how the pip.work.sh works:
20+
21+
```
22+
./pip.work.sh
23+
|--> sh step1.sh
24+
|--> sh function1.sh
25+
|--> sh/perl sub-function scripts/software/program [parameters]
26+
|--> sh function2.sh
27+
|--> sh/perl sub-function scripts/software/program [parameters]
28+
...
29+
```
30+
31+
As you can see, the sub-funtion tools could come from websites, packages, or just wirtten by yourself. And what you need to do is to locate the scripts pathway and make sure the parameters are friendly for most of the naming manners, such as the capablility to read and locate an absolute path. Thus you can leave the rest things to the pipeline.
32+
33+
In the following step, I'll add your scripts into pipeline and distribute the unified input parameters as well as a proper output directory. Or some addtional options for the function of your part.
34+
35+
# Usage
36+
337
This workflow beased on the dataset produced by Waters and polishing via metaX (https://www.bioconductor.org/packages/release/bioc/html/metaX.html).
438

539
Here we started our pipeline with the input data after processed with metaX.
@@ -30,7 +64,7 @@ This command will produce their results in each directory:
3064

3165
```
3266
00.data/DemoAnalyst.comm.phenotype.tab
33-
```
67+
```
3468

3569
contains the phenotype with samples involved for further analysis. Those outliers either in positive or negtive ion mode will be discarded.
3670

@@ -109,7 +143,7 @@ This command will produce their results in each directory:
109143

110144
```
111145
05.correlation/DemoAnalyst_selected.spearman.xls.spearman.tab
112-
```
146+
```
113147

114148
## GLM
115149

README.md

+68-48
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,45 @@
11
# Omics_pipeline
2-
This pipeline is built to ease my pressure for Multiple omics analysis. In this version, I'm focused on the process of data polishing.
2+
This pipeline is built to ease my pressure for Multiple omics analysis. In this version, I'm focused on the process of data polishing of metagenome-wide analysis.
33

44
# Get Start & Usage
55

6-
本项目目前仅包含**Metagenomics宏基因组学标准流程****Lipidomics脂质组学分析流程**
6+
本项目目前包含**Metagenomics宏基因组学标准流程****Lipidomics脂质组学分析流程(暂停维护)**
77

8-
### Metagenomics pipeline
8+
## Metagenomics pipeline
99

10-
在集群工作目录下`clone`本仓库并将主程序添加到环境变量中。
10+
### Install
11+
12+
**Method 1 (Recommand):** 通过加载集群的[binboard](https://biogit.cn/META/Binboard#usage)配置即可直接调用。
13+
14+
```
15+
echo '. /ldfssz1/ST_META/share/flow/metaEnv' >> ~/.bashrc
16+
```
17+
18+
**Method 2:** 自定义配置本流程:
1119

1220
```
1321
cd /your/dir/
1422
clone [email protected]:Fangchao/Omics_pipeline.git
1523
ln -s /your/dir/Omics_pipeline/MetaGenomics/cOMG ~/bin/
1624
```
1725

18-
#### Usage:
26+
> :warning: 目前集群仅有**登录节点、安装节点**可访问**biogit.cn** ,且需要**添加域名转发配置**(如下):
27+
>
28+
> `echo -e "Host biogit.cn\nHostName 172.17.11.248" >> ~/.ssh/config`
29+
30+
### Usage:
1931

2032
```
2133
cOMG #直接执行命令可以查看使用说明
2234
usage:
23-
cOMG <pe|se> [options]
24-
pattern
35+
cOMG <pe|se|config|cmd> [options]
36+
mode
2537
pe|se pair end | single end
38+
config generate a config file template
39+
cmd directely call a sub-script under bin/ or util/
2640
options:
2741
-p|path :[essential]sample path file (SampleID|fqID|fqPath)
28-
-i|ins :[essential for pair-end seq]insert info file
42+
-i|ins :[essential for pe mode]insert info file or a number of insert size
2943
-s|step :functions,default 1234
3044
1 trim+filter, OA method
3145
2 remove host genomic reads
@@ -39,26 +53,61 @@ options:
3953

4054
**path file**: 用于记录raw data文件位置和id信息的文件,每行三列分别记录下**样本编号**, **数据编号****fq文件路径**
4155

42-
- `样本编号`:生物学,统计学意义上的样本个体,用于后续分析的基本个体
43-
- `数据编号`:如果同一个样本进行多次测序,则会产生多个数据,此时需要用数据编号来区分(可以使文库号,日期,批次,等等)。`注意`:拥有相同`样本编号`的多个数据会被最终合并计算相对丰度。
56+
- `样本编号`:生物学,统计学意义上的样本个体,用于后续分析的基本个体。
57+
58+
> :warning: 在相对丰度计算步骤中,相同`样本编号`的数据会合并计算到一个结果文件中,并以`样本编号`作为结果文件前缀;
59+
>
60+
> :warning: pe模式中,来自同一个样本的\*1.fq和\*2.fq应使用相同的`样本编号`
61+
>
62+
> :warning: 请避免以数字开头
63+
64+
- `数据编号`:如果同一个样本进行多次测序,则会产生多个数据,此时需要用数据编号来区分(可以使文库号,日期,批次,等等)。
65+
66+
> :warning: 拥有相同`数据编号`的多个数据会被认为来自同一批次;
67+
>
68+
> ⚠ 前三步骤的结果文件均以`数据编号`作为前缀;
69+
> ⚠ pe模式中,来自同一个数据的\*1.fq和\*2.fq应使用相同的`数据编号`
70+
>
71+
> :warning: 请避免以数字开头
72+
4473
- `fastq路径`: 必须是工作环境可以访问到的路径位置
4574

75+
> :warning: 请按 read1,read2,single read(若有)的顺序排列每个数据的输入数据路径
76+
77+
e.g:
78+
79+
```
80+
column -t test.5samples.path
81+
t01 ERR260132 ./fastq/ERR260132_1.fastq.gz
82+
t01 ERR260132 ./fastq/ERR260132_2.fastq.gz
83+
t02.sth ERR260133 ./fastq/ERR260133_1.fastq.gz
84+
t02.sth ERR260133 ./fastq/ERR260133_2.fastq.gz
85+
t03_rep ERR260134 ./fastq/ERR260134_1.fastq.gz
86+
t03_rep ERR260134 ./fastq/ERR260134_2.fastq.gz
87+
t04_rep_2 ERR260135 ./fastq/ERR260135_1.fastq.gz
88+
t04_rep_2 ERR260135 ./fastq/ERR260135_2.fastq.gz
89+
t05 ERR260136 ./fastq/ERR260136_1.fastq.gz
90+
t05 ERR260136 ./fastq/ERR260136_2.fastq.gz
91+
```
92+
93+
94+
4695
**config file**: 由于流程涉及到的分析步骤较多,对于每个具体工具的参数定义,统一放在配置文件中进行处理:
4796

4897
```
4998
###### configuration
5099
51100
### Database location
52101
db_host = $META_DB/human/hg19/Hg19.fa.index #宿主参考基因集db前缀,用于去除宿主来源的reads
53-
db_meta = $META_DB/1267sample_ICG_db/4Group_uniqGene.div_1.fa.index,$META_DB/1267sample_ICG_db/4Group_uniqGene.div_2.fa.index #参考基因集db前缀,多个文件可以用逗号分隔
102+
db_meta = $META_DB/1267sample_ICG_db/4Group_uniqGene.div_1.fa.index,$META_DB/1267sample_ICG_db/4Group_uniqGene.div_2.fa.index #参考基因集db前缀,多套索引可以用逗号分隔
54103
55104
### reference gene length file
56105
RGL = $META_DB/IGC.annotation/IGC_9.9M_update.fa.len #与参考基因集匹配的每个基因的长度信息,用于计算相对丰度
57106
### pipeline parameters
58-
PhQ = 33 #reads Phred Quality system: 33 or 64.
59-
mLen= 30 #minimal read length allowance
60-
seedOA=0.9
61-
fragOA=0.8
107+
PhQ = 33 # reads Phred Quality system: 33 or 64.
108+
mLen= 30 # minimal read length allowance
109+
seedOA=0.9 # OA过滤方法中,对种子部分的OA阈值(准确率) [0,1]
110+
fragOA=0.8 # OA过滤方法中,对截取全长的OA阈值(准确率) [0,1]
62111
63112
qsub = 1234 #Following argment will enable only if qusb=on, otherwise you could commit it
64113
q = st.q #queue for qsub
@@ -79,15 +128,16 @@ r = 10 #repeat time when job failed or interrupted
79128
上述配置文件准备完毕后,运行本脚本可以生成工作目录:
80129

81130
```
82-
cOMG se -p sample.path.file -o demo
131+
cd t
132+
cOMG se -p demo.input.lst -c demo.cfg -o demo.test
83133
```
84134

85135
随后进入工作目录,检查脚本无误后可以启动执行脚本:
86136

87137
```
88138
sh RUN.batch.sh # 模式一,全部样本完成当前步骤后才会进入下一步骤;
89139
sh RUN.linear.1234.sh # 模式二,每个样本依次运行每个步骤,相互不影响;
90-
sh RUN.qsubM.sh # 模式三,同上,采用改进的qsub管理脚本,可自动处理异常情况
140+
sh RUN.qsubM.sh # 模式三,同上,采用改进的qsub管理脚本,可自动处理异常情况(推荐)
91141
```
92142

93143
完成后可以执行`sh report.stat.sh`打印报告表格。
@@ -98,34 +148,4 @@ sh RUN.qsubM.sh # 模式三,同上,采用改进的qsub管理脚本,可
98148

99149
### Lipidomics pipeline
100150

101-
### What do I wanna perform?
102-
As a pipeline, I plan to orgainze the workshop directory like this:
103-
```
104-
./ #output directory
105-
|-- pip.work.sh # A script contained all function set to run. A MAIN SWITCH.
106-
|-- Shell # A directory contains all of scripts organized under steps order.
107-
|-- Results # A directory contains all of the results (exactly, including the intermediate results)
108-
```
109-
And all the users need to do is preparing a config file and write them into a script to build the workshop above.
110-
Here is an example:
111-
112-
`/ifs1/ST_MD/PMO/F13ZOOYJSY1389_T2D_MultiOmics/Users/fangchao/lipidomics.20151118/pip.config.sh`
113-
114-
For a better understanding of the pipeline's logic, a tree following shows you how the pip.work.sh works:
115-
```
116-
./pip.work.sh
117-
|--> sh step1.sh
118-
|--> sh function1.sh
119-
|--> sh/perl sub-function scripts/software/program [parameters]
120-
|--> sh function2.sh
121-
|--> sh/perl sub-function scripts/software/program [parameters]
122-
...
123-
```
124-
As you can see, the sub-funtion tools could come from websites, packages, or just wirtten by yourself. And what you need to do is to locate the scripts pathway and make sure the parameters are friendly for most of the naming manners, such as the capablility to read and locate an absolute path. Thus you can leave the rest things to the pipeline.
125-
126-
In the following step, I'll add your scripts into pipeline and distribute the unified input parameters as well as a proper output directory. Or some addtional options for the function of your part.
127-
128-
129-
```
130-
131-
```
151+
See detail [here](/Lipidomics/README.md).

0 commit comments

Comments
 (0)