chine (SVM) algorithm was used to build a binary classi- fication model to separate lncRNAs from mRNAs. The classification model achieved high accuracy (95.6%) on training data with 10-fold cross-validation. PLEK also performed well on data from other vertebrates, using ...
One run of this experimental setup is described by Algorithm 1. Algorithm 1 Pseudocode of our method used to estimate proportions of sources in sink s Additionally, we performed a 5-fold cross-validation experiment by splitting the collection of metagenomic samples into 5 stratified folds with non...
ak-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expectedk-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller’s internal score. Merfin increased th...
Methods We present the algorithm design for SKESA, some impor- tant implementation details, design of test sets used for running time and assembly quality comparisons, and command lines used for doing the runs. We compare SKESA to SPAdes v3.11.1 and MegaHit v1.1.2. Assess- ment of ...
as it is commonly used ink-mer-based methods. While for larger genomes, largerkare used as well, we use the valuek=31throughout the main matter to allow for easier comparison between results. Furthermore, for all data sets but theC. elegansreference, the matchtigs algorithm ran out of t...
areD=4Kpossible words of lengthKin the DNA (RNA) alphabet, and in our study we tested word lengths from two to eight. The methods tested differ in the way they represent a sequence as K-mers and how this information is utilized in a statistical learning algorithm to achieve best possible...
but it has an added advantage that it uses a manually curated database of virus reference genomes augmented with metagenomic viral (virome) sequences sampled from freshwater, seawater, and human gut, lung and saliva. Another advantage is the use of the strand switching and short gene criteria ...
Transposable elements were identified using Repeat-Masker [6] and MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) [27,28]. This database provides a hierarchical classification of plant transposable elements and other repeat types. Before use, the database was screened for non...
Eastlake D, Hansen T, Fowler G, Vo K, Noll L. The FNV Non-Cryptographic Hash Algorithm [Internet]. 2019. Available from:https://datatracker.ietf.org/doc/html/draft-eastlake-fnv-17.html. Brister JR, Ako-Adjei D, Bao Y, Blinkova O. NCBI viral genomes resource. Nucleic Acids Res. 201...
The trivial features also underestimate the associations between the microbiome and related factors, such as environmental data and host phenotypes. In this paper, we present an algorithm—KTU (K-mer Taxonomic Unit)—for re-clustering ASVs that improves the biological relevance of microbiomes that ...