Building a tree from consensus sequences

11/15/2023 0 Comments

Building a tree from consensus sequences

A quality filtering step based on quality scores and length can be added by using NanoFilt (de Coster et al., 2018), seqtk ( ), PRINSEQ ( ‐linux/prinseq), or fastp (Chen et al., 2018). Barcoded reads are demultiplexed while basecalling in Guppy or afterward with the guppy_barcoder in the Guppy suite ( ), Porechop ( ), Minibar (Krehenwinkel, Pomerantz, Henderson, et al., 2019), qcat ( ), or by using UMIs (Karst et al., 2021). Most of these analyses perform the four following steps: 1. Reference‐free consensus sequences have been made before to identify bacteria (Calus et al., 2018 Davidov et al., 2020 Karst et al., 2021 Rodríguez‐Pérez et al., 2021), viruses (Chan et al., 2020), fungi (Morrison et al., 2020 Simmons et al., 2020), invertebrates (Chang et al., 2020 Knot et al., 2020), and vertebrates (Pomerantz et al., 2018 Seah et al., 2020) or to replace Sanger sequencing by ONT consensus methods (Simmons et al., 2020). To analyze amplicons and come to a consensus without the availability of reference sequences, several steps have to be performed. This may result in the generation of a consensus sequence based on a mixture of the sequences of two or more species. Unknown species may be assigned to incorrect genera because of the high error rate in the reads and low similarity with available sequences. Reads of mixed samples (soil, water, food, feces…) containing sequences of species not yet included in databases can be difficult to be assigned to a species or genus (Wei et al., 2020) with standard Operational Taxonomic Unit (OTU) clustering programs (Bolyen et al., 2019 Rognes et al., 2016a Schloss et al., 2009). Several programs and pipelines are available to create a consensus sequence based on existing reference sequences (Krehenwinkel et al., 2019 Maloney et al., 2020 Moore et al., 2020 Sikolenko & Valentovich, 2021 Strassert et al., 2021). Many ONT applications and tools exist (Wang et al., 2021), but specific tools for processing and consensus calling of amplicon sequences are limited. However, to this date, the main disadvantage of ONT is the relatively low read quality, which most recently reached a modal of 99.3% with the new Q20+ technology and an R10.4 flow cell ( ).

In comparison with short‐read sequencers such as Illumina (2 × 300 bp) and IonTorrent (600 bp) (Slatko et al., 2018), there is virtually no limit to the amplicon length for ONT. Long‐read sequencing methods from Oxford Nanopore Technologies (ONT) (Eisenstein, 2012) can also be used to mass sequence amplicons.

0 Comments

YOUR CART

Building a tree from consensus sequences

Leave a Reply.

Author

Archives

Categories