Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 34(20): 3488-3495, 2018 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-29850774

RESUMO

Motivation: Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations. Results: We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10-3 uncertainty. Availability and implementation: The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Benchmarking , Genoma , Genótipo , Polimorfismo de Nucleotídeo Único
2.
Bioinformatics ; 34(24): 4241-4247, 2018 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-29868720

RESUMO

Motivation: Several tools exist to count Mendelian violations in family trios by comparing variants at the same genomic positions. This naive variant comparison, however, fails to assess regions where multiple variants need to be examined together, resulting in reduced accuracy of existing Mendelian violation checking tools. Results: We introduce VBT, a trio concordance analysis tool, which identifies Mendelian violations by approximately solving the 3-way variant matching problem to resolve variant representation differences in family trios. We show that VBT outperforms previous trio comparison methods by accuracy. Availability and implementation: VBT is implemented in C++ and source code is available under GNU GPLv3 license at the following URL: https://github.com/sbg/VBT-TrioAnalysis.git. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Genômica , Software , Genoma/genética , Genômica/métodos
3.
Nature ; 470(7332): 59-65, 2011 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-21293372

RESUMO

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.


Assuntos
Variações do Número de Cópias de DNA/genética , Genética Populacional , Genoma Humano/genética , Genômica , Duplicação Gênica/genética , Predisposição Genética para Doença/genética , Genótipo , Humanos , Mutagênese Insercional/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Deleção de Sequência/genética
4.
PLoS Genet ; 7(8): e1002236, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21876680

RESUMO

As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.


Assuntos
Elementos de DNA Transponíveis , Genoma Humano , Polimorfismo de Nucleotídeo Único , Frequência do Gene , Genótipo , Heterozigoto , Humanos , Mutagênese Insercional , Taxa de Mutação
5.
Nat Genet ; 51(2): 354-362, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30643257

RESUMO

The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.


Assuntos
Genoma Humano/genética , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Deleção de Sequência/genética , Sequenciamento Completo do Genoma/métodos
6.
Cancer Inform ; 17: 1176935118774787, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30283230

RESUMO

Increased efforts in cancer genomics research and bioinformatics are producing tremendous amounts of data. These data are diverse in origin, format, and content. As the amount of available sequencing data increase, technologies that make them discoverable and usable are critically needed. In response, we have developed a Semantic Web-based Data Browser, a tool allowing users to visually build and execute ontology-driven queries. This approach simplifies access to available data and improves the process of using them in analyses on the Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org). The Data Browser makes large data sets easily explorable and simplifies the retrieval of specific data of interest. Although initially implemented on top of The Cancer Genome Atlas (TCGA) data set, the Data Browser's architecture allows for seamless integration of other data sets. By deploying it on the CGC, we have enabled remote researchers to access data and perform collaborative investigations.

7.
Pac Symp Biocomput ; 22: 154-165, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27896971

RESUMO

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.


Assuntos
Software , Fluxo de Trabalho , Biologia Computacional , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes
8.
Cancer Res ; 77(21): e3-e6, 2017 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-29092927

RESUMO

The Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org) enables researchers to rapidly access and collaborate on massive public cancer genomic datasets, including The Cancer Genome Atlas. It provides secure on-demand access to data, analysis tools, and computing resources. Researchers from diverse backgrounds can easily visualize, query, and explore cancer genomic datasets visually or programmatically. Data of interest can be immediately analyzed in the cloud using more than 200 preinstalled, curated bioinformatics tools and workflows. Researchers can also extend the functionality of the platform by adding their own data and tools via an intuitive software development kit. By colocalizing these resources in the cloud, the CGC enables scalable, reproducible analyses. Researchers worldwide can use the CGC to investigate key questions in cancer genomics. Cancer Res; 77(21); e3-6. ©2017 AACR.


Assuntos
Biologia Computacional , Genômica , Neoplasias/genética , Genoma Humano , Humanos , Internet , Pesquisa , Software
9.
Genome Biol ; 10(11): R133, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19930548

RESUMO

Coding nucleotide sequences contain myriad functions independent of their encoded protein sequences. We present the COMIT algorithm to detect functional noncoding motifs in coding regions using sequence conservation, explicitly separating nucleotide from amino acid effects. COMIT concurs with diverse experimental datasets, including splicing enhancers, silencers, replication motifs, and microRNA targets, and predicts many novel functional motifs. Intriguingly, COMIT scores are well-correlated to scores uncalibrated for amino acids, suggesting that nucleotide motifs often override peptide-level constraints.


Assuntos
Modelos Genéticos , Algoritmos , Motivos de Aminoácidos , Animais , Arabidopsis/genética , Códon , Biologia Computacional/métodos , DNA/genética , Humanos , Funções Verossimilhança , Camundongos , MicroRNAs/genética , Modelos Estatísticos , Nucleotídeos/genética , Oryza/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA