Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 34(2): 179-188, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38355308

RESUMO

A mechanistic understanding of the biological and technical factors that impact transcript measurements is essential to designing and analyzing single-cell and single-nucleus RNA sequencing experiments. Nuclei contain the same pre-mRNA population as cells, but they contain a small subset of the mRNAs. Nonetheless, early studies argued that single-nucleus analysis yielded results comparable to cellular samples if pre-mRNA measurements were included. However, typical workflows do not distinguish between pre-mRNA and mRNA when estimating gene expression, and variation in their relative abundances across cell types has received limited attention. These gaps are especially important given that incorporating pre-mRNA has become commonplace for both assays, despite known gene length bias in pre-mRNA capture. Here, we reanalyze public data sets from mouse and human to describe the mechanisms and contrasting effects of mRNA and pre-mRNA sampling on gene expression and marker gene selection in single-cell and single-nucleus RNA-seq. We show that pre-mRNA levels vary considerably among cell types, which mediates the degree of gene length bias and limits the generalizability of a recently published normalization method intended to correct for this bias. As an alternative, we repurpose an existing post hoc gene length-based correction method from conventional RNA-seq gene set enrichment analysis. Finally, we show that inclusion of pre-mRNA in bioinformatic processing can impart a larger effect than assay choice itself, which is pivotal to the effective reuse of existing data. These analyses advance our understanding of the sources of variation in single-cell and single-nucleus RNA-seq experiments and provide useful guidance for future studies.


Assuntos
Núcleo Celular , Precursores de RNA , Humanos , Animais , Camundongos , RNA-Seq , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Núcleo Celular/genética , Perfilação da Expressão Gênica/métodos , Análise de Célula Única
2.
Nat Methods ; 19(4): 445-448, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35396485

RESUMO

Structural variants are associated with cancers and developmental disorders, but challenges with estimating population frequency remain a barrier to prioritizing mutations over inherited variants. In particular, variability in variant calling heuristics and filtering limits the use of current structural variant catalogs. We present STIX, a method that, instead of relying on variant calls, indexes and searches the raw alignments from thousands of samples to enable more comprehensive allele frequency estimation.


Assuntos
Genoma , Variação Estrutural do Genoma , Neoplasias , Algoritmos , Variação Estrutural do Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Software
3.
Am J Hum Genet ; 108(4): 597-607, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33675682

RESUMO

Each human genome includes de novo mutations that arose during gametogenesis. While these germline mutations represent a fundamental source of new genetic diversity, they can also create deleterious alleles that impact fitness. Whereas the rate and patterns of point mutations in the human germline are now well understood, far less is known about the frequency and features that impact de novo structural variants (dnSVs). We report a family-based study of germline mutations among 9,599 human genomes from 33 multigenerational CEPH-Utah families and 2,384 families from the Simons Foundation Autism Research Initiative. We find that de novo structural mutations detected by alignment-based, short-read WGS occur at an overall rate of at least 0.160 events per genome in unaffected individuals, and we observe a significantly higher rate (0.206 per genome) in ASD-affected individuals. In both probands and unaffected samples, nearly 73% of de novo structural mutations arose in paternal gametes, and we predict most de novo structural mutations to be caused by mutational mechanisms that do not require sequence homology. After multiple testing correction, we did not observe a statistically significant correlation between parental age and the rate of de novo structural variation in offspring. These results highlight that a spectrum of mutational mechanisms contribute to germline structural mutations and that these mechanisms most likely have markedly different rates and selective pressures than those leading to point mutations.


Assuntos
Família , Genoma Humano/genética , Células Germinativas , Mutação em Linhagem Germinativa/genética , Taxa de Mutação , Envelhecimento/genética , Transtorno Autístico/genética , Viés , Variações do Número de Cópias de DNA/genética , Análise Mutacional de DNA , Feminino , Humanos , Masculino , Idade Paterna , Mutação Puntual/genética
4.
Bioinformatics ; 38(5): 1231-1234, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34864893

RESUMO

SUMMARY: We present trfermikit, a software tool designed to detect deletions larger than 50 bp occurring in Variable Number Tandem Repeats using Illumina DNA sequencing reads. In such regions, it achieves a better tradeoff between sensitivity and false discovery than a state-of-the-art structural variation caller, Manta and complements it by recovering a significant number of deletions that Manta missed. trfermikit is based upon the fermikit pipeline, which performs read assembly, maps the assembly to the reference genome and calls variants from the alignment. AVAILABILITY AND IMPLEMENTATION: https://github.com/petermchale/trfermikit. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Software , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
5.
BMC Bioinformatics ; 23(1): 482, 2022 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-36376793

RESUMO

BACKGROUND: Despite numerous molecular and computational advances, roughly half of patients with a rare disease remain undiagnosed after exome or genome sequencing. A particularly challenging barrier to diagnosis is identifying variants that cause deleterious alternative splicing at intronic or exonic loci outside of canonical donor or acceptor splice sites. RESULTS: Several existing tools predict the likelihood that a genetic variant causes alternative splicing. We sought to extend such methods by developing a new metric that aids in discerning whether a genetic variant leads to deleterious alternative splicing. Our metric combines genetic variation in the Genome Aggregate Database with alternative splicing predictions from SpliceAI to compare observed and expected levels of splice-altering genetic variation. We infer genic regions with significantly less splice-altering variation than expected to be constrained. The resulting model of regional splicing constraint captures differential splicing constraint across gene and exon categories, and the most constrained genic regions are enriched for pathogenic splice-altering variants. Building from this model, we developed ConSpliceML. This ensemble machine learning approach combines regional splicing constraint with multiple per-nucleotide alternative splicing scores to guide the prediction of deleterious splicing variants in protein-coding genes. ConSpliceML more accurately distinguishes deleterious and benign splicing variants than state-of-the-art splicing prediction methods, especially in "cryptic" splicing regions beyond canonical donor or acceptor splice sites. CONCLUSION: Integrating a model of genetic constraint with annotations from existing alternative splicing tools allows ConSpliceML to prioritize potentially deleterious splice-altering variants in studies of rare human diseases.


Assuntos
Processamento Alternativo , Doenças Raras , Humanos , Doenças Raras/genética , Splicing de RNA , Íntrons , Éxons , Mutação , Sítios de Splice de RNA
6.
BMC Bioinformatics ; 23(1): 490, 2022 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-36384437

RESUMO

BACKGROUND: Identification of deleterious genetic variants using DNA sequencing data relies on increasingly detailed filtering strategies to isolate the small subset of variants that are more likely to underlie a disease phenotype. Datasets reflecting population allele frequencies of different types of variants serve as powerful filtering tools, especially in the context of rare disease analysis. While such population-scale allele frequency datasets now exist for structural variants (SVs), it remains a challenge to match SV calls between multiple datasets, thereby complicating estimates of a putative SV's population allele frequency. RESULTS: We introduce SVAFotate, a software tool that enables the annotation of SVs with variant allele frequency and related information from existing SV datasets. As a result, VCF files annotated by SVAFotate offer a variety of metrics to aid in the stratification of SVs as common or rare in the broader human population. CONCLUSIONS: Here we demonstrate the use of SVAFotate in the classification of SVs with regards to their population frequency and illustrate how SVAFotate's annotations can be used to filter and prioritize SVs. Lastly, we detail how best to utilize these SV annotations in the analysis of genetic variation in studies of rare disease.


Assuntos
Frequência do Gene , Sequenciamento de Nucleotídeos em Larga Escala , Software , Humanos , Doenças Raras
7.
Genome Res ; 29(4): 532-542, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30858344

RESUMO

Coding variants in epigenetic regulators are emerging as causes of neurological dysfunction and cancer. However, a comprehensive effort to identify disease candidates within the human epigenetic machinery (EM) has not been performed; it is unclear whether features exist that distinguish between variation-intolerant and variation-tolerant EM genes, and between EM genes associated with neurological dysfunction versus cancer. Here, we rigorously define 295 genes with a direct role in epigenetic regulation (writers, erasers, remodelers, readers). Systematic exploration of these genes reveals that although individual enzymatic functions are always mutually exclusive, readers often also exhibit enzymatic activity (dual-function EM genes). We find that the majority of EM genes are very intolerant to loss-of-function variation, even when compared to the dosage sensitive transcription factors, and we identify 102 novel EM disease candidates. We show that this variation intolerance is driven by the protein domains encoding the epigenetic function, suggesting that disease is caused by a perturbed chromatin state. We then describe a large subset of EM genes that are coexpressed within multiple tissues. This subset is almost exclusively populated by extremely variation-intolerant genes and shows enrichment for dual-function EM genes. It is also highly enriched for genes associated with neurological dysfunction, even when accounting for dosage sensitivity, but not for cancer-associated EM genes. Finally, we show that regulatory regions near epigenetic regulators are genetically important for common neurological traits. These findings prioritize novel disease candidate EM genes and suggest that this coexpression plays a functional role in normal neurological homeostasis.


Assuntos
Epigênese Genética , Doenças do Sistema Nervoso/genética , Polimorfismo Genético , Montagem e Desmontagem da Cromatina , Humanos , Mutação com Perda de Função , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
8.
Bioinformatics ; 37(24): 4860-4861, 2021 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-34146087

RESUMO

SUMMARY: Unfazed is a command-line tool to determine the parental gamete of origin for de novo mutations from paired-end Illumina DNA sequencing reads. Unfazed uses variant information for a sequenced trio to identify the parental gamete of origin by linking phase-informative inherited variants to de novo mutations using read-based phasing. It achieves a high success rate by chaining reads into haplotype groups, thus increasing the search space for informative sites. Unfazed provides a simple command-line interface and scales well to large inputs, determining parent-of-origin for nearly 30 000 de novo variants in under 60 h. AVAILABILITY AND IMPLEMENTATION: Unfazed is available at https://github.com/jbelyeu/unfazed. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Análise de Sequência de DNA , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala
9.
Nucleic Acids Res ; 48(12): 6597-6610, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32479598

RESUMO

The human genome encodes an order of magnitude more gene expression enhancers than promoters, suggesting that most genes are regulated by the combined action of multiple enhancers. We have previously shown that neighboring estrogen-responsive enhancers exhibit complex synergistic contributions to the production of an estrogenic transcriptional response. Here we sought to determine the molecular underpinnings of this enhancer cooperativity. We generated genetic deletions of four estrogen receptor α (ER) bound enhancers that regulate two genes and found that enhancers containing full estrogen response element (ERE) motifs control ER binding at neighboring sites, while enhancers with pre-existing histone acetylation/accessibility confer a permissible chromatin environment to the neighboring enhancers. Genome engineering revealed that two enhancers with half EREs could not compensate for the lack of a full ERE site within the cluster. In contrast, two enhancers with full EREs produced a transcriptional response greater than the wild-type locus. By swapping genomic sequences, we found that the genomic location of a full ERE strongly influences enhancer activity. Our results lead to a model in which a full ERE is required for ER recruitment, but the presence of a pre-existing permissible chromatin environment can also be needed for estrogen-driven gene regulation to occur.


Assuntos
Elementos Facilitadores Genéticos/genética , Receptor alfa de Estrogênio/genética , Motivos de Nucleotídeos/genética , Transcrição Gênica , Acetilação , Cromatina/genética , Proteínas de Ligação a DNA/genética , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Humanos , Regiões Promotoras Genéticas/genética
10.
Proc Natl Acad Sci U S A ; 116(19): 9491-9500, 2019 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-31019089

RESUMO

The textbook view that most germline mutations in mammals arise from replication errors is indirectly supported by the fact that there are both more mutations and more cell divisions in the male than in the female germline. When analyzing large de novo mutation datasets in humans, we find multiple lines of evidence that call that view into question. Notably, despite the drastic increase in the ratio of male to female germ cell divisions after the onset of spermatogenesis, even young fathers contribute three times more mutations than young mothers, and this ratio barely increases with parental age. This surprising finding points to a substantial contribution of damage-induced mutations. Indeed, C-to-G transversions and CpG transitions, which together constitute over one-fourth of all base substitution mutations, show genomic distributions and sex-specific age dependencies indicative of double-strand break repair and methylation-associated damage, respectively. Moreover, we find evidence that maternal age at conception influences the mutation rate both because of the accumulation of damage in oocytes and potentially through an influence on the number of postzygotic mutations in the embryo. These findings reveal underappreciated roles of DNA damage and maternal age in the genesis of human germline mutations.


Assuntos
Quebras de DNA de Cadeia Dupla , Reparo do DNA , Bases de Dados de Ácidos Nucleicos , Mutação em Linhagem Germinativa , Idade Materna , Adolescente , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Oócitos , Gravidez , Espermatogênese/genética
11.
Nat Methods ; 15(2): 123-126, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29309061

RESUMO

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.


Assuntos
Neoplasias da Mama/genética , Genoma Humano , Genômica/métodos , Ferramenta de Busca/métodos , Análise de Sequência de DNA/métodos , Software , Bases de Dados Genéticas , Feminino , Humanos , Internet
12.
PLoS Comput Biol ; 16(1): e1007625, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-32004313

RESUMO

Ribosome profiling, an application of nucleic acid sequencing for monitoring ribosome activity, has revolutionized our understanding of protein translation dynamics. This technique has been available for a decade, yet the current state and standardization of publicly available computational tools for these data is bleak. We introduce XPRESSyourself, an analytical toolkit that eliminates barriers and bottlenecks associated with this specialized data type by filling gaps in the computational toolset for both experts and non-experts of ribosome profiling. XPRESSyourself automates and standardizes analysis procedures, decreasing time-to-discovery and increasing reproducibility. This toolkit acts as a reference implementation of current best practices in ribosome profiling analysis. We demonstrate this toolkit's performance on publicly available ribosome profiling data by rapidly identifying hypothetical mechanisms related to neurodegenerative phenotypes and neuroprotective mechanisms of the small-molecule ISRIB during acute cellular stress. XPRESSyourself brings robust, rapid analysis of ribosome-profiling data to a broad and ever-expanding audience and will lead to more reproducible and accessible measurements of translation regulation. XPRESSyourself software is perpetually open-source under the GPL-3.0 license and is hosted at https://github.com/XPRESSyourself, where users can access additional documentation and report software issues.


Assuntos
Biologia Computacional/métodos , RNA/genética , Ribossomos/genética , Análise de Sequência de RNA/métodos , Software , Bases de Dados Genéticas , Células HEK293 , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Internet , Biossíntese de Proteínas/genética , Reprodutibilidade dos Testes
13.
Am J Hum Genet ; 100(3): 406-413, 2017 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-28190455

RESUMO

The potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from a cohort are mislabeled, swapped, or contaminated or if they include unintended individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs, or scientists in the process of sequencing. We have developed a software package, peddy, to identify and facilitate the remediation of such errors via interactive visualizations and reports comparing the stated sex, relatedness, and ancestry to what is inferred from the individual genotypes derived from whole-genome (WGS) or whole-exome (WES) sequencing. Peddy predicts a sample's ancestry using a machine learning model trained on individuals of diverse ancestries from the 1000 Genomes Project reference panel. Peddy facilitates both automated and interactive, visual detection of sample swaps, poor sequencing quality, and other indicators of sample problems that, if left undetected, would inhibit discovery.


Assuntos
Genoma Humano , Aprendizado de Máquina , Análise de Sequência de DNA/métodos , Software , Cromossomos Humanos X/genética , Exoma , Feminino , Estudos de Associação Genética , Loci Gênicos , Genótipo , Humanos , Masculino , Linhagem , Polimorfismo de Nucleotídeo Único
15.
Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29873782

RESUMO

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.


Assuntos
Genômica/métodos , Software , Imunoprecipitação da Cromatina , Fator de Transcrição GATA1/metabolismo , Internet , Análise de Sequência de DNA , Interface Usuário-Computador
16.
Nat Methods ; 13(1): 63-5, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26550772

RESUMO

Genotype Query Tools (GQT) is an indexing strategy that expedites analyses of genome-variation data sets in Variant Call Format based on sample genotypes, phenotypes and relationships. GQT's compressed genotype index minimizes decompression for analysis, and its performance relative to that of existing methods improves with cohort size. We show substantial (up to 443-fold) gains in performance over existing methods and demonstrate GQT's utility for exploring massive data sets involving thousands to millions of genomes. GQT can be accessed at https://github.com/ryanlayer/gqt.


Assuntos
Variação Genética , Genótipo , Conjuntos de Dados como Assunto
17.
Bioinformatics ; 34(5): 867-868, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29096012

RESUMO

Summary: Mosdepth is a new command-line tool for rapidly calculating genome-wide sequencing coverage. It measures depth from BAM or CRAM files at either each nucleotide position in a genome or for sets of genomic regions. Genomic regions may be specified as either a BED file to evaluate coverage across capture regions, or as a fixed-size window as required for copy-number calling. Mosdepth uses a simple algorithm that is computationally efficient and enables it to quickly produce coverage summaries. We demonstrate that mosdepth is faster than existing tools and provides flexibility in the types of coverage profiles produced. Availability and implementation: mosdepth is available from https://github.com/brentp/mosdepth under the MIT license. Contact: bpederse@gmail.com. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Genoma Humano , Genômica/métodos , Humanos , Sequenciamento do Exoma/métodos
18.
Bioinformatics ; 34(19): 3387-3389, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-29718142

RESUMO

Motivation: Extracting biological insight from genomic data inevitably requires custom software. In many cases, this is accomplished with scripting languages, owing to their accessibility and brevity. Unfortunately, the ease of scripting languages typically comes at a substantial performance cost that is especially acute with the scale of modern genomics datasets. Results: We present hts-nim, a high-performance library written in the Nim programming language that provides a simple, scripting-like syntax without sacrificing performance. Availability and implementation: hts-nim is available at https://github.com/brentp/hts-nim and the example tools are at https://github.com/brentp/hts-nim-tools both under the MIT license.


Assuntos
Genômica , Linguagens de Programação , Software , Biologia Computacional
19.
Nat Methods ; 12(10): 966-8, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26258291

RESUMO

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular/métodos , Software , Variação Genética , Humanos , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Medicina de Precisão/métodos , Fluxo de Trabalho
20.
Bioinformatics ; 33(12): 1867-1869, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28165109

RESUMO

MOTIVATION: Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files. RESULTS: We introduce cyvcf2 , a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility. CONTACT: bpederse@gmail.com or aaronquinlan@gmail.com. AVAILABILITY AND IMPLEMENTATION: cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/.


Assuntos
Variação Genética , Técnicas de Genotipagem/métodos , Análise de Sequência de DNA/métodos , Software , Humanos , Metadados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA