RIVET Backend Configuration File
Below you will find explainations for each field in the following config.yaml
file.
# GCP Credentials [LEAVE EMPTY FOR LOCAL JOB]
bucket_id:
project_id:
key_file: /tmp/keys/
# GCP Machine and Storage Bucket Config [LEAVE EMPTY FOR LOCAL JOB]
instances:
boot_disk_size: 50
machine_type:
# Ripples Parameters Config [REQUIRED]
version: ripples-fast
mat:
newick:
metadata:
date:
# Local results output directory, or name of folder on GCP storage bucket
results:
reference: reference.fa
# Additional Parameters
num_descendants: 5
public_tree: True
verbose: False
# Default to all available threads if left empty
threads:
docker_image: mrkylesmith/ripples_pipeline:latest
generate_taxonium: False
RIVET GCP Job Parameters
Warning
If you are running your RIVET
backend job on GCP, you must fill out all of the fields in this subsection. Otherwise, if you are running your RIVET
job locally on your machine, just leave these fields blank.
-
bucket_id
: The name of the GCP Storage Bucket whereRIVET
will find your pipeline inputs, and write the outputs of the pipeline. -
project_id
: The name of your GCP project, where your Storage Bucket can be found. -
key_file
: Location (path) to find a GCP authentication keysJSON
file, that will giveRIVET
the necessary permissions to access your GCP account and storage bucket. -
instances
: The number of GCP instances (machines) to parallelize yourRIVET
job across. RIVET will automatically partition the number of long branches in the givenMAT
acrossn
instances given by this field and search for recombination events and perform filtration checks in parallel onn
machines. -
boot_disk_size: 50
This field should be left as50
, and pertains only to GCP machines. -
machine_type: n2d-highcpu-32
The types of GCP machine to use forRIVET
job. We recommbend leaving this field asn2d-highcpu-32
machine, sinceRIVET
is optimized to take advantage of GCP compute optimized instances, but this field can be changed if desired. The list of available machines can be found at the following page: Machine families resource and comparison guide
Info
For more information on GCP acount setup including obtaining the necessary key_file
, please see the GCP Setup Docs
RIVET Specific Parameters
-
version: ripples-fast
Do not change this field. We recommend usingripples-fast
, which is a new implementation of theRIPPLES
algorithm that produces identical results with considerable speedup. -
mat
: The mutation-annotated tree (MAT) input phylogeny generated by UShER to search for recombination. A daily-updated database of SARS-CoV-2 mutation-annotated trees has been made available through matUtils and can be found here: https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/. -
newick
: The name of the Newick tree file that will be used by theRIVET
backend pipeline. Could be named<DATE>_tree.nwk
for example. No actual input file is required for this field, just provide the name of the file, andRIVET
will convert to the Newick file format internally. -
metadata
: Provide the name of the sequence metadata file you obtained here: metadata. This is aTSV
file containing information about each sample in theMAT
, including its name, date sequenced, country sequenced, and clade/lineage information. This information is used throughout theRIVET
backend pipeline, for inferring the recombinant ancestor emergence date for example. -
date
: The date corresponding to the inputMAT
and metadata files used, in the following format year-month-day. Eg.)2023-06-01
-
results
: The name of directory to write allRIVET
output files to, both locally and in GCP storage bucket if running remote job. -
reference: reference.fa
The name of the SARS-CoV-2 reference file, that will be automatically downloaded by theRIVET
pipeline. For SARS-CoV-2 recombination inference, we recommend not changing this field. -
num_descendants: 5
The minimum number of leaves that a node should have to be considered for recombination. -
public_tree: True
This field should be set toTrue
if theMAT
was obtained at the following link: https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/. -
verbose: False
If set toFalse
, most standard out information will be written to log files, instead of printed to the console during the pipeline execution. -
threads
: As many stages ofRIVET
andRIPPLES
are multithreaded, this field sets the number of threads to use when runningRIVET
locally. If this field is left blank, the number of threads will automatically equal the number of available cores on the machine. -
docker_image: mrkylesmith/ripples_pipeline:latest
The public Docker image forRIVET
that will be used when executing the pipeline on GCP. Do not change this field. -
generate_taxonium: False
When set toTrue
,RIVET
will generate a Taxoniumjsonl
file that can be loaded into the Taxonium web interface or desktop app to view the global phylogeny for the given inputMAT
.