Skip to content

Experiment reproduction

Kang Hu edited this page Dec 25, 2024 · 36 revisions

Prerequisites:

1. Downloading Repbase Data

  • As Repbase is a paid database, you must download the Repbase data from https://www.girinst.org/server/RepBase/index.php before reproducing our experiments. For example, download RepBase26.05.fasta.tar.
  • After downloading the data, copy the files athrep.ref, cbrrep.ref, drorep.ref, maize.ref, oryrep.ref, and zebrep.ref to the ${pathToHiTE}/library directory. For instance, include HiTE/library/oryrep.ref to replicate experiments related to rice. Remember to remove non-TE elements from the oryrep.ref file, including satellite sequences and others.

2. Downloading Genome Data

Download the Reference Genome of Rice.

3. Installing HiTE

Installation Guide

4. Downloading EDTA

cd ${pathTo}/HiTE
git clone https://github.com/oushujun/EDTA.git

One-Step Execution and Replication of Benchmarking Results

python main.py \
 --genome ${pathTo/genome}/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 --thread ${thread} \
 --outdir ${output_dir} \
 --plant 1 \
 --BM_RM2 1 \
 --BM_EDTA 1 \
 --EDTA_home ${EDTA_home} \
 --BM_HiTE 1 \
 --coverage_threshold 0.95 \ # Switch to 0.99 if you prefer a more stringent threshold.
 --species rice #[dmel, rice, cb, zebrafish, maize, ath], set --plant 0 if you choose the non-plant species
 
 # Example command: python main.py \
 # --genome /home/hukang/EDTA/krf_test/rice/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 # --thread 40 \
 # --outdir /homeb/hukang/KmerRepFinder_test/library/rice/ \
 # --plant 1 \
 # --BM_RM2 1 \
 # --BM_EDTA 1 \
 # --EDTA_home /home/hukang/HiTE/EDTA \
 # --BM_HiTE 1 \
 # --coverage_threshold 0.95 \
 # --species rice

# The benchmarking results can be found at "${output_dir}/BM_RM2.log, ${output_dir}/BM_EDTA.log, ${output_dir}/BM_HiTE.log".

Step by Step Execution and Replication of Benchmarking Results

# 1. Run HiTE.
python main.py \
 --genome ${pathTo/genome}/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 --thread ${thread} \
 --outdir ${output_dir} \
 --plant 1 # Set --plant 0 if you choose a non-plant species

 # Example command: python main.py \
 # --genome /home/hukang/EDTA/krf_test/rice/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 # --thread 40 \
 # --outdir /homeb/hukang/KmerRepFinder_test/library/rice/ \
 # --plant 1

# 2. Skip HiTE, and run the benchmarking method of RepeatModeler2 (BM_RM2)
python main.py \
 --genome ${pathTo/genome}/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 --thread ${thread} \
 --outdir ${output_dir} \
 --plant 1 \
 --skip_HiTE 1 \
 --BM_RM2 1 \
 --coverage_threshold 0.95 \ # Switch to 0.99 if you prefer a more stringent threshold.
 --species rice #[dmel, rice, cb, zebrafish, maize, ath], set --plant 0 if you choose the non-plant species
 
 # Example command: python main.py \
 # --genome /home/hukang/EDTA/krf_test/rice/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 # --thread 40 \
 # --outdir /homeb/hukang/KmerRepFinder_test/library/rice/ \
 # --plant 1 \
 # --skip_HiTE 1 \
 # --BM_RM2 1 \
 # --coverage_threshold 0.95 \
 # --species rice

# The benchmarking results can be found at "${output_dir}/BM_RM2.log".


# 3. Skip HiTE, BM_RM2, and run the benchmarking method of EDTA (BM_EDTA)
python main.py \
 --genome ${pathTo/genome}/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 --thread ${thread} \
 --outdir ${output_dir} \
 --plant 1 \
 --skip_HiTE 1 \
 --BM_RM2 0 \
 --BM_EDTA 1 \
 --EDTA_home ${EDTA_home} \
 --species rice #[dmel, rice, cb, zebrafish, maize, ath], set --plant 0 if you choose the non-plant species
 
 # Example command: python main.py \
 # --genome /home/hukang/EDTA/krf_test/rice/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 # --thread 40 \
 # --outdir /homeb/hukang/KmerRepFinder_test/library/rice/ \
 # --plant 1 \
 # --skip_HiTE 1 \
 # --BM_RM2 0 \
 # --BM_EDTA 1 \
 # --EDTA_home /home/hukang/HiTE/EDTA \
 # --species rice

# The benchmarking results can be found at "${output_dir}/BM_EDTA.log".

# 4. Skip HiTE, BM_RM2, BM_EDTA, and run the benchmarking method of HiTE(BM_HiTE)
python main.py \
 --genome ${pathTo/genome}/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 --thread ${thread} \
 --outdir ${output_dir} \
 --plant 1 \
 --skip_HiTE 1 \
 --BM_RM2 0 \
 --BM_EDTA 0 \
 --EDTA_home ${EDTA_home} \
 --BM_HiTE 1 \
 --coverage_threshold 0.95 \ # Switch to 0.99 if you prefer a more stringent threshold.
 --species rice #[dmel, rice, cb, zebrafish, maize, ath], set --plant 0 if you choose the non-plant species
 
 # Example command: python main.py \
 # --genome /home/hukang/EDTA/krf_test/rice/GCF_001433935.1_IRGSP-1.0_genomic.fna \
 # --thread 40 \
 # --outdir /homeb/hukang/KmerRepFinder_test/library/rice/ \
 # --plant 1 \
 # --skip_HiTE 1 \
 # --BM_RM2 0 \
 # --BM_EDTA 0 \
 # --EDTA_home /home/hukang/HiTE/EDTA \
 # --BM_HiTE 1 \
 # --coverage_threshold 0.95 \
 # --species rice

# The benchmarking results can be found at "${output_dir}/BM_HiTE.log".