Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros

Bases de dados
Ano de publicação
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Pac Symp Biocomput ; 27: 313-324, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34890159

RESUMO

As the last decade of human genomics research begins to bear the fruit of advancements in precision medicine, it is important to ensure that genomics' improvements in human health are distributed globally and equitably. An important step to ensuring health equity is to improve the human reference genome to capture global diversity by including a wide variety of alternative haplotypes, sequences that are not currently captured on the reference genome.We present a method that localizes 100 basepair (bp) long sequences extracted from short-read sequencing that can ultimately be used to identify what regions of the human genome non-reference sequences belong to.We extract reads that don't align to the reference genome, and compute the population's distribution of 100-mers found within the unmapped reads. We use genetic data from families to identify shared genetic material between siblings and match the distribution of unmapped k-mers to these inheritance patterns to determine the the most likely genomic region of a k-mer. We perform this localization with two highly interpretable methods of artificial intelligence: a computationally tractable Hidden Markov Model coupled to a Maximum Likelihood Estimator. Using a set of alternative haplotypes with known locations on the genome, we show that our algorithm is able to localize 96% of k-mers with over 90% accuracy and less than 1Mb median resolution. As the collection of sequenced human genomes grows larger and more diverse, we hope that this method can be used to improve the human reference genome, a critical step in addressing precision medicine's diversity crisis.


Assuntos
Inteligência Artificial , Genoma Humano , Biologia Computacional , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
2.
BMC Med Genomics ; 8: 64, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26470712

RESUMO

BACKGROUND: While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10's of dollars. RESULTS: We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. CONCLUSIONS: Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration.


Assuntos
Computação em Nuvem/economia , Análise Custo-Benefício , Técnicas de Genotipagem/economia , Sequenciamento de Nucleotídeos em Larga Escala/economia , Benchmarking , Genômica , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA