Using unique molecular identifiers to improve allele calling in low-template mixtures.

Crysup, Benjamin; Mandape, Sammed; King, Jonathan L; Muenzler, Melissa; Kapema, Kapema Bupe; Woerner, August E

Crysup, Benjamin; Mandape, Sammed; King, Jonathan L; Muenzler, Melissa; Kapema, Kapema Bupe; Woerner, August E.

Afiliación

Crysup B; Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA. Electronic address: Benjamin.Crysup@unthsc.edu.
Mandape S; Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.
King JL; Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.
Muenzler M; Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.
Kapema KB; Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.
Woerner AE; Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA; Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.

Forensic Sci Int Genet ; 63: 102807, 2023 03.

Article en En | MEDLINE | ID: mdl-36462297

ABSTRACT

ABSTRACT

PCR artifacts are an ever-present challenge in sequencing applications. These artifacts can seriously limit the analysis and interpretation of low-template samples and mixtures, especially with respect to a minor contributor. In medicine, molecular barcoding techniques have been employed to decrease the impact of PCR error and to allow the examination of low-abundance somatic variation. In principle, it should be possible to apply the same techniques to the forensic analysis of mixtures. To that end, several short tandem repeat loci were selected for targeted sequencing, and a bioinformatic pipeline for analyzing the sequence data was developed. The pipeline notes the relevant unique molecular identifiers (UMIs) attached to each read and, using machine learning, filters the noise products out of the set of potential alleles. To evaluate this pipeline, DNA from pairs of individuals were mixed at different ratios (1-1, 1-9) and sequenced with different starting amounts of DNA (10, 1 and 0.1 ng). Naïvely using the information in the molecular barcodes led to increased performance, with the machine learning resulting in an additional benefit. In concrete terms, using the UMI data results in less noise for a given amount of drop out. For instance, if thresholds are selected that filter out a quarter of the true alleles, using read counts accepts 2381 noise alleles and using raw UMI counts accepts 1726 noise alleles, while the machine learning approach only accepts 307.

Asunto(s)

ADN; Secuenciación de Nucleótidos de Alto Rendimiento; Humanos; Alelos; ADN/análisis; Dermatoglifia del ADN/métodos; Análisis de Secuencia de ADN; Repeticiones de Microsatélite

Palabras clave

DNA Mixtures; Machine Learning; Massively Parallel Sequencing; Molecular Barcodes; Stutter

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: ADN / Secuenciación de Nucleótidos de Alto Rendimiento Límite: Humans Idioma: En Revista: Forensic Sci Int Genet Asunto de la revista: GENETICA / JURISPRUDENCIA Año: 2023 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google