Quick Start
The following examples demonstrate metaWEPP on simulated and real-world metagenomic samples.
Example - 1: Simulated metagenomic sample
Step 1: Download the MAT for SARS-CoV-2 and RSV-A.
wget https://hgdownload.gi.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/2023/04/01/public-2023-04-01.all.masked.pb.gz
wget https://hgdownload.gi.ucsc.edu/hubs/GCF/002/815/475/GCF_002815475.1/UShER_RSV-A/2025/04/25/rsvA.2025-04-25.pb.gz
Step 2: Download Viral Kraken2 database.
wget https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20251015.tar.gz
mkdir -p viral_kraken_db
tar -xvzf k2_viral_20251015.tar.gz -C viral_kraken_db
rm k2_viral_20251015.tar.gz
Note
⚠️ You can skip this step entirely by passing the .tar.gz URL straight to KRAKEN_DB= in the next command. metaWEPP will download and extract it on first run, and reuse it on subsequent runs.
Step 3: Run the pipeline.
run-metawepp --config KRAKEN_DB=viral_kraken_db DIR=simulated_metagenomic_sample MIN_PROP=0.05 PATHOGENS=default,respiratory_syncytial_virus_a,sars_cov_2 CLADE_LIST=,nextstrain,nextstrain:pango CLADE_IDX=-1,0,1 CORES_PER_PATHOGEN=,8,24 --cores 32
When prompted to add a new species for haplotype-level analysis, press y. Then follow the steps below to add the SARS-CoV-2 and RSV-A mutation-annotated trees (MATs).
SARS-CoV-2:
a) Type "sars cov 2" as the virus of interest.
b) Select "Severe acute respiratory syndrome coronavirus 2" by entering "1" and pressing Enter.
c) Select "NC_045512.2" by entering "1" and pressing Enter.
d) Provide the MAT file path: "./public-2023-04-01.all.masked.pb.gz".
Press y when prompted again to add a new species for haplotype-level analysis. Next, follow the steps below to add the RSV-A MAT.
RSV-A:
a) Type "respiratory syncytial virus a" as the virus of interest.
b) Select "human respiratory syncytial virus" by entering "2" and pressing Enter.
c) Select "NC_038235.1" by entering "1" and pressing Enter.
d) Provide the MAT file path: "./rsvA.2025-04-25.pb.gz".
Step 4: Analyze Results.
Species proportions can be viewed in results/simulated_metagenomic_sample/classification_proportions.png, which shows the following proportions:
- 66.33% Severe acute respiratory syndrome coronavirus 2
- 31.38% Human respiratory syncytial virus A
- 1.95% human respiratory syncytial virus
Haplotype-level results generated by WEPP for SARS-CoV-2 and RSV-A are available in WEPP/results/simulated_metagenomic_sample_sars_cov_2/metaWEPP_run_haplotype_abundance.csv and WEPP/results/simulated_metagenomic_sample_respiratory_syncytial_virus_a/metaWEPP_run_haplotype_abundance.csv, respectively.
SARS-CoV-2 haplotype abundances:
RSV-A haplotype abundances:
Argentina/BA-HNRG-369/2017|ON237340.1|2017-06-26,A.D.2.2,0.500000
USA/MA-Broad_MGB-13815/2022|OQ171906.1|2022-07-27,A.D.1.5,0.500000
Example - 2: Real world metagenomic sample
Step 1: Download the real world metagenomic sample.
mkdir -p data/real_metagenomic_sample
cd data/real_metagenomic_sample
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR108/074/ERR10812874/ERR10812874_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR108/074/ERR10812874/ERR10812874_2.fastq.gz
mv ERR10812874_1.fastq.gz ERR10812874_R1.fastq.gz
mv ERR10812874_2.fastq.gz ERR10812874_R2.fastq.gz
cd ../../
Step 2: Download the RSV-A MAT.
wget https://hgdownload.gi.ucsc.edu/hubs/GCF/002/815/475/GCF_002815475.1/UShER_RSV-A/2025/04/25/rsvA.2025-04-25.pb.gz
Step 3: Download Viral Kraken2 database.
wget https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20251015.tar.gz
mkdir -p viral_kraken_db
tar -xvzf k2_viral_20251015.tar.gz -C viral_kraken_db
rm k2_viral_20251015.tar.gz
Note
⚠️ You can skip this step entirely by passing the .tar.gz URL straight to KRAKEN_DB= in the next command. metaWEPP will download and extract it on first run, and reuse it on subsequent runs.
Step 4: Run the pipeline.
run-metawepp --config KRAKEN_DB=viral_kraken_db DIR=real_metagenomic_sample MIN_PROP=0.05 PATHOGENS=default,respiratory_syncytial_virus_a CLADE_LIST=,nextstrain CLADE_IDX=-1,0 --cores 32
When prompted to add a new species for haplotype-level analysis, press y. Then follow the steps below to add the RSV-A and Rhinovirus-A mutation-annotated trees (MATs).
RSV-A:
a) Type "respiratory syncytial virus a" as the virus of interest.
b) Select "human respiratory syncytial virus" by entering "2" and pressing Enter.
c) Select "NC_038235.1" by entering "1" and pressing Enter.
d) Provide the MAT file path: "./rsvA.2025-04-25.pb.gz".
Press y when prompted again to add a new species for haplotype-level analysis. Next, follow the steps below to build the Rhinovirus-A MAT using viral_usher. Does NOT work if you are using metaWEPP Docker container.
Rhinovirus-A:
a) Type "rhinovirus a" as the virus of interest.
b) Select "Rhinovirus A" by pressing "1" and pressing Enter.
c) Select "NC_001617.1" by pressing "2" and pressing Enter.
d) Press Enter when asks to build a MAT with viral_usher.
e) Type "0.1" for minimum length proportion of the RefSeq length and press Enter.
f) Type "0.5" for maximum 'N' proportion and press Enter.
g) Use default values for maxmimum private and branch substitutions by pressing Enter.
h) Press Enter when it asks for more fasta files files.
i) Press Enter when it asks for title for your tree.
j) Enter the directory path that was displayed before the start of this questionnaire for downloading sequences and building trees. It should look similar to: "path_to_metaWEPP/data/pathogens_for_wepp/rhinovirus_a/viral_usher_build"
Step 5: Analyze Results.
Species proportions can be viewed in results/real_metagenomic_sample/classification_proportions.png, which shows the following proportions:
- 74.74% Unclassified
- 12.95% Human respiratory syncytial virus A
- 6.96% Choristoneura fumiferana granulovirus
- 2.00% Others
- 1.83% Shamonda virus
- 1.53% human respiratory syncytial virus
Haplotype-level results generated by WEPP for RSV-A and Rhinovirus-A are available in WEPP/results/real_metagenomic_sample_respiratory_syncytial_virus_a/metaWEPP_run_haplotype_abundance.csv and WEPP/results/real_metagenomic_sample_rhinovirus_a/metaWEPP_run_haplotype_abundance.csv, respectively.
RSV-A Haplotype abundances:
Rhinovirus-A Haplotype abundances: