Your browser doesn't support javascript.
loading
A machine learning approach for accurate and real-time DNA sequence identification.
Wang, Yiren; Alangari, Mashari; Hihath, Joshua; Das, Arindam K; Anantram, M P.
  • Wang Y; Department of Electrical and Computer Engineering, University of Washington, 98195, Seattle, WA, USA. ethanwyr@uw.edu.
  • Alangari M; Electrical and Computer Engineering Department, University of California Davis, 95616, Davis, CA, USA.
  • Hihath J; Electrical and Computer Engineering Department, University of California Davis, 95616, Davis, CA, USA.
  • Das AK; Department of Electrical Engineering, Eastern Washington University, 99004, Cheney, WA, USA.
  • Anantram MP; Department of Electrical and Computer Engineering, University of Washington, 98195, Seattle, WA, USA. anantmp@uw.edu.
BMC Genomics ; 22(1): 525, 2021 Jul 09.
Article en En | MEDLINE | ID: mdl-34243709
ABSTRACT

BACKGROUND:

The all-electronic Single Molecule Break Junction (SMBJ) method is an emerging alternative to traditional polymerase chain reaction (PCR) techniques for genetic sequencing and identification. Existing work indicates that the current spectra recorded from SMBJ experimentations contain unique signatures to identify known sequences from a dataset. However, the spectra are typically extremely noisy due to the stochastic and complex interactions between the substrate, sample, environment, and the measuring system, necessitating hundreds or thousands of experimentations to obtain reliable and accurate results.

RESULTS:

This article presents a DNA sequence identification system based on the current spectra of ten short strand sequences, including a pair that differs by a single mismatch. By employing a gradient boosted tree classifier model trained on conductance histograms, we demonstrate that extremely high accuracy, ranging from approximately 96 % for molecules differing by a single mismatch to 99.5 % otherwise, is possible. Further, such accuracy metrics are achievable in near real-time with just twenty or thirty SMBJ measurements instead of hundreds or thousands. We also demonstrate that a tandem classifier architecture, where the first stage is a multiclass classifier and the second stage is a binary classifier, can be employed to boost the single mismatched pair's identification accuracy to 99.5 %.

CONCLUSIONS:

A monolithic classifier, or more generally, a multistage classifier with model specific parameters that depend on experimental current spectra can be used to successfully identify DNA strands.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: ADN / Aprendizaje Automático Tipo de estudio: Diagnostic_studies Idioma: En Año: 2021 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: ADN / Aprendizaje Automático Tipo de estudio: Diagnostic_studies Idioma: En Año: 2021 Tipo del documento: Article