Your browser doesn't support javascript.
loading
On Minimizers and Convolutional Filters: Theoretical Connections and Applications to Genome Analysis.
Yu, Yun William.
Afiliación
  • Yu YW; Department of Mathematics, University of Toronto, Toronto, Ontario, Canada.
J Comput Biol ; 31(5): 381-395, 2024 05.
Article en En | MEDLINE | ID: mdl-38687333
ABSTRACT
Minimizers and convolutional neural networks (CNNs) are two quite distinct popular techniques that have both been employed to analyze categorical biological sequences. At face value, the methods seem entirely dissimilar. Minimizers use min-wise hashing on a rolling window to extract a single important k-mer feature per window. CNNs start with a wide array of randomly initialized convolutional filters, paired with a pooling operation, and then multiple additional neural layers to learn both the filters themselves and how they can be used to classify the sequence. In this study, our main result is a careful mathematical analysis of hash function properties showing that for sequences over a categorical alphabet, random Gaussian initialization of convolutional filters with max-pooling is equivalent to choosing a minimizer ordering such that selected k-mers are (in Hamming distance) far from the k-mers within the sequence but close to other minimizers. In empirical experiments, we find that this property manifests as decreased density in repetitive regions, both in simulation and on real human telomeres. We additionally train from scratch a CNN embedding of synthetic short-reads from the SARS-CoV-2 genome into 3D Euclidean space that locally recapitulates the linear sequence distance of the read origins, a modest step toward building a deep learning assembler, although it is at present too slow to be practical. In total, this article provides a partial explanation for the effectiveness of CNNs in categorical sequence analysis.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / SARS-CoV-2 / COVID-19 Límite: Humans Idioma: En Revista: J Comput Biol Asunto de la revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Canadá

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / SARS-CoV-2 / COVID-19 Límite: Humans Idioma: En Revista: J Comput Biol Asunto de la revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Canadá