Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 84
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Genome Res ; 34(2): 179-188, 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38355308

RESUMEN

A mechanistic understanding of the biological and technical factors that impact transcript measurements is essential to designing and analyzing single-cell and single-nucleus RNA sequencing experiments. Nuclei contain the same pre-mRNA population as cells, but they contain a small subset of the mRNAs. Nonetheless, early studies argued that single-nucleus analysis yielded results comparable to cellular samples if pre-mRNA measurements were included. However, typical workflows do not distinguish between pre-mRNA and mRNA when estimating gene expression, and variation in their relative abundances across cell types has received limited attention. These gaps are especially important given that incorporating pre-mRNA has become commonplace for both assays, despite known gene length bias in pre-mRNA capture. Here, we reanalyze public data sets from mouse and human to describe the mechanisms and contrasting effects of mRNA and pre-mRNA sampling on gene expression and marker gene selection in single-cell and single-nucleus RNA-seq. We show that pre-mRNA levels vary considerably among cell types, which mediates the degree of gene length bias and limits the generalizability of a recently published normalization method intended to correct for this bias. As an alternative, we repurpose an existing post hoc gene length-based correction method from conventional RNA-seq gene set enrichment analysis. Finally, we show that inclusion of pre-mRNA in bioinformatic processing can impart a larger effect than assay choice itself, which is pivotal to the effective reuse of existing data. These analyses advance our understanding of the sources of variation in single-cell and single-nucleus RNA-seq experiments and provide useful guidance for future studies.


Asunto(s)
Núcleo Celular , Precursores del ARN , Humanos , Animales , Ratones , RNA-Seq , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Núcleo Celular/genética , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual
2.
Nat Methods ; 19(4): 445-448, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35396485

RESUMEN

Structural variants are associated with cancers and developmental disorders, but challenges with estimating population frequency remain a barrier to prioritizing mutations over inherited variants. In particular, variability in variant calling heuristics and filtering limits the use of current structural variant catalogs. We present STIX, a method that, instead of relying on variant calls, indexes and searches the raw alignments from thousands of samples to enable more comprehensive allele frequency estimation.


Asunto(s)
Genoma , Variación Estructural del Genoma , Neoplasias , Algoritmos , Variación Estructural del Genoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Programas Informáticos
3.
Am J Hum Genet ; 108(4): 597-607, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33675682

RESUMEN

Each human genome includes de novo mutations that arose during gametogenesis. While these germline mutations represent a fundamental source of new genetic diversity, they can also create deleterious alleles that impact fitness. Whereas the rate and patterns of point mutations in the human germline are now well understood, far less is known about the frequency and features that impact de novo structural variants (dnSVs). We report a family-based study of germline mutations among 9,599 human genomes from 33 multigenerational CEPH-Utah families and 2,384 families from the Simons Foundation Autism Research Initiative. We find that de novo structural mutations detected by alignment-based, short-read WGS occur at an overall rate of at least 0.160 events per genome in unaffected individuals, and we observe a significantly higher rate (0.206 per genome) in ASD-affected individuals. In both probands and unaffected samples, nearly 73% of de novo structural mutations arose in paternal gametes, and we predict most de novo structural mutations to be caused by mutational mechanisms that do not require sequence homology. After multiple testing correction, we did not observe a statistically significant correlation between parental age and the rate of de novo structural variation in offspring. These results highlight that a spectrum of mutational mechanisms contribute to germline structural mutations and that these mechanisms most likely have markedly different rates and selective pressures than those leading to point mutations.


Asunto(s)
Familia , Genoma Humano/genética , Células Germinativas , Mutación de Línea Germinal/genética , Tasa de Mutación , Envejecimiento/genética , Trastorno Autístico/genética , Sesgo , Variaciones en el Número de Copia de ADN/genética , Análisis Mutacional de ADN , Femenino , Humanos , Masculino , Edad Paterna , Mutación Puntual/genética
4.
Bioinformatics ; 38(5): 1231-1234, 2022 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-34864893

RESUMEN

SUMMARY: We present trfermikit, a software tool designed to detect deletions larger than 50 bp occurring in Variable Number Tandem Repeats using Illumina DNA sequencing reads. In such regions, it achieves a better tradeoff between sensitivity and false discovery than a state-of-the-art structural variation caller, Manta and complements it by recovering a significant number of deletions that Manta missed. trfermikit is based upon the fermikit pipeline, which performs read assembly, maps the assembly to the reference genome and calls variants from the alignment. AVAILABILITY AND IMPLEMENTATION: https://github.com/petermchale/trfermikit. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Programas Informáticos , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento
5.
BMC Bioinformatics ; 23(1): 482, 2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-36376793

RESUMEN

BACKGROUND: Despite numerous molecular and computational advances, roughly half of patients with a rare disease remain undiagnosed after exome or genome sequencing. A particularly challenging barrier to diagnosis is identifying variants that cause deleterious alternative splicing at intronic or exonic loci outside of canonical donor or acceptor splice sites. RESULTS: Several existing tools predict the likelihood that a genetic variant causes alternative splicing. We sought to extend such methods by developing a new metric that aids in discerning whether a genetic variant leads to deleterious alternative splicing. Our metric combines genetic variation in the Genome Aggregate Database with alternative splicing predictions from SpliceAI to compare observed and expected levels of splice-altering genetic variation. We infer genic regions with significantly less splice-altering variation than expected to be constrained. The resulting model of regional splicing constraint captures differential splicing constraint across gene and exon categories, and the most constrained genic regions are enriched for pathogenic splice-altering variants. Building from this model, we developed ConSpliceML. This ensemble machine learning approach combines regional splicing constraint with multiple per-nucleotide alternative splicing scores to guide the prediction of deleterious splicing variants in protein-coding genes. ConSpliceML more accurately distinguishes deleterious and benign splicing variants than state-of-the-art splicing prediction methods, especially in "cryptic" splicing regions beyond canonical donor or acceptor splice sites. CONCLUSION: Integrating a model of genetic constraint with annotations from existing alternative splicing tools allows ConSpliceML to prioritize potentially deleterious splice-altering variants in studies of rare human diseases.


Asunto(s)
Empalme Alternativo , Enfermedades Raras , Humanos , Enfermedades Raras/genética , Empalme del ARN , Intrones , Exones , Mutación , Sitios de Empalme de ARN
6.
BMC Bioinformatics ; 23(1): 490, 2022 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-36384437

RESUMEN

BACKGROUND: Identification of deleterious genetic variants using DNA sequencing data relies on increasingly detailed filtering strategies to isolate the small subset of variants that are more likely to underlie a disease phenotype. Datasets reflecting population allele frequencies of different types of variants serve as powerful filtering tools, especially in the context of rare disease analysis. While such population-scale allele frequency datasets now exist for structural variants (SVs), it remains a challenge to match SV calls between multiple datasets, thereby complicating estimates of a putative SV's population allele frequency. RESULTS: We introduce SVAFotate, a software tool that enables the annotation of SVs with variant allele frequency and related information from existing SV datasets. As a result, VCF files annotated by SVAFotate offer a variety of metrics to aid in the stratification of SVs as common or rare in the broader human population. CONCLUSIONS: Here we demonstrate the use of SVAFotate in the classification of SVs with regards to their population frequency and illustrate how SVAFotate's annotations can be used to filter and prioritize SVs. Lastly, we detail how best to utilize these SV annotations in the analysis of genetic variation in studies of rare disease.


Asunto(s)
Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Humanos , Enfermedades Raras
7.
Genome Res ; 29(4): 532-542, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30858344

RESUMEN

Coding variants in epigenetic regulators are emerging as causes of neurological dysfunction and cancer. However, a comprehensive effort to identify disease candidates within the human epigenetic machinery (EM) has not been performed; it is unclear whether features exist that distinguish between variation-intolerant and variation-tolerant EM genes, and between EM genes associated with neurological dysfunction versus cancer. Here, we rigorously define 295 genes with a direct role in epigenetic regulation (writers, erasers, remodelers, readers). Systematic exploration of these genes reveals that although individual enzymatic functions are always mutually exclusive, readers often also exhibit enzymatic activity (dual-function EM genes). We find that the majority of EM genes are very intolerant to loss-of-function variation, even when compared to the dosage sensitive transcription factors, and we identify 102 novel EM disease candidates. We show that this variation intolerance is driven by the protein domains encoding the epigenetic function, suggesting that disease is caused by a perturbed chromatin state. We then describe a large subset of EM genes that are coexpressed within multiple tissues. This subset is almost exclusively populated by extremely variation-intolerant genes and shows enrichment for dual-function EM genes. It is also highly enriched for genes associated with neurological dysfunction, even when accounting for dosage sensitivity, but not for cancer-associated EM genes. Finally, we show that regulatory regions near epigenetic regulators are genetically important for common neurological traits. These findings prioritize novel disease candidate EM genes and suggest that this coexpression plays a functional role in normal neurological homeostasis.


Asunto(s)
Epigénesis Genética , Enfermedades del Sistema Nervioso/genética , Polimorfismo Genético , Ensamble y Desensamble de Cromatina , Humanos , Mutación con Pérdida de Función , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
8.
Bioinformatics ; 37(24): 4860-4861, 2021 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-34146087

RESUMEN

SUMMARY: Unfazed is a command-line tool to determine the parental gamete of origin for de novo mutations from paired-end Illumina DNA sequencing reads. Unfazed uses variant information for a sequenced trio to identify the parental gamete of origin by linking phase-informative inherited variants to de novo mutations using read-based phasing. It achieves a high success rate by chaining reads into haplotype groups, thus increasing the search space for informative sites. Unfazed provides a simple command-line interface and scales well to large inputs, determining parent-of-origin for nearly 30 000 de novo variants in under 60 h. AVAILABILITY AND IMPLEMENTATION: Unfazed is available at https://github.com/jbelyeu/unfazed. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Análisis de Secuencia de ADN , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento
9.
Nucleic Acids Res ; 48(12): 6597-6610, 2020 07 09.
Artículo en Inglés | MEDLINE | ID: mdl-32479598

RESUMEN

The human genome encodes an order of magnitude more gene expression enhancers than promoters, suggesting that most genes are regulated by the combined action of multiple enhancers. We have previously shown that neighboring estrogen-responsive enhancers exhibit complex synergistic contributions to the production of an estrogenic transcriptional response. Here we sought to determine the molecular underpinnings of this enhancer cooperativity. We generated genetic deletions of four estrogen receptor α (ER) bound enhancers that regulate two genes and found that enhancers containing full estrogen response element (ERE) motifs control ER binding at neighboring sites, while enhancers with pre-existing histone acetylation/accessibility confer a permissible chromatin environment to the neighboring enhancers. Genome engineering revealed that two enhancers with half EREs could not compensate for the lack of a full ERE site within the cluster. In contrast, two enhancers with full EREs produced a transcriptional response greater than the wild-type locus. By swapping genomic sequences, we found that the genomic location of a full ERE strongly influences enhancer activity. Our results lead to a model in which a full ERE is required for ER recruitment, but the presence of a pre-existing permissible chromatin environment can also be needed for estrogen-driven gene regulation to occur.


Asunto(s)
Elementos de Facilitación Genéticos/genética , Receptor alfa de Estrógeno/genética , Motivos de Nucleótidos/genética , Transcripción Genética , Acetilación , Cromatina/genética , Proteínas de Unión al ADN/genética , Regulación de la Expresión Génica/genética , Genoma Humano/genética , Humanos , Regiones Promotoras Genéticas/genética
10.
Proc Natl Acad Sci U S A ; 116(19): 9491-9500, 2019 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-31019089

RESUMEN

The textbook view that most germline mutations in mammals arise from replication errors is indirectly supported by the fact that there are both more mutations and more cell divisions in the male than in the female germline. When analyzing large de novo mutation datasets in humans, we find multiple lines of evidence that call that view into question. Notably, despite the drastic increase in the ratio of male to female germ cell divisions after the onset of spermatogenesis, even young fathers contribute three times more mutations than young mothers, and this ratio barely increases with parental age. This surprising finding points to a substantial contribution of damage-induced mutations. Indeed, C-to-G transversions and CpG transitions, which together constitute over one-fourth of all base substitution mutations, show genomic distributions and sex-specific age dependencies indicative of double-strand break repair and methylation-associated damage, respectively. Moreover, we find evidence that maternal age at conception influences the mutation rate both because of the accumulation of damage in oocytes and potentially through an influence on the number of postzygotic mutations in the embryo. These findings reveal underappreciated roles of DNA damage and maternal age in the genesis of human germline mutations.


Asunto(s)
Roturas del ADN de Doble Cadena , Reparación del ADN , Bases de Datos de Ácidos Nucleicos , Mutación de Línea Germinal , Edad Materna , Adolescente , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Oocitos , Embarazo , Espermatogénesis/genética
11.
Nat Methods ; 15(2): 123-126, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29309061

RESUMEN

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.


Asunto(s)
Neoplasias de la Mama/genética , Genoma Humano , Genómica/métodos , Motor de Búsqueda/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bases de Datos Genéticas , Femenino , Humanos , Internet
12.
PLoS Comput Biol ; 16(1): e1007625, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-32004313

RESUMEN

Ribosome profiling, an application of nucleic acid sequencing for monitoring ribosome activity, has revolutionized our understanding of protein translation dynamics. This technique has been available for a decade, yet the current state and standardization of publicly available computational tools for these data is bleak. We introduce XPRESSyourself, an analytical toolkit that eliminates barriers and bottlenecks associated with this specialized data type by filling gaps in the computational toolset for both experts and non-experts of ribosome profiling. XPRESSyourself automates and standardizes analysis procedures, decreasing time-to-discovery and increasing reproducibility. This toolkit acts as a reference implementation of current best practices in ribosome profiling analysis. We demonstrate this toolkit's performance on publicly available ribosome profiling data by rapidly identifying hypothetical mechanisms related to neurodegenerative phenotypes and neuroprotective mechanisms of the small-molecule ISRIB during acute cellular stress. XPRESSyourself brings robust, rapid analysis of ribosome-profiling data to a broad and ever-expanding audience and will lead to more reproducible and accessible measurements of translation regulation. XPRESSyourself software is perpetually open-source under the GPL-3.0 license and is hosted at https://github.com/XPRESSyourself, where users can access additional documentation and report software issues.


Asunto(s)
Biología Computacional/métodos , ARN/genética , Ribosomas/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Bases de Datos Genéticas , Células HEK293 , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Internet , Biosíntesis de Proteínas/genética , Reproducibilidad de los Resultados
13.
Am J Hum Genet ; 100(3): 406-413, 2017 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-28190455

RESUMEN

The potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from a cohort are mislabeled, swapped, or contaminated or if they include unintended individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs, or scientists in the process of sequencing. We have developed a software package, peddy, to identify and facilitate the remediation of such errors via interactive visualizations and reports comparing the stated sex, relatedness, and ancestry to what is inferred from the individual genotypes derived from whole-genome (WGS) or whole-exome (WES) sequencing. Peddy predicts a sample's ancestry using a machine learning model trained on individuals of diverse ancestries from the 1000 Genomes Project reference panel. Peddy facilitates both automated and interactive, visual detection of sample swaps, poor sequencing quality, and other indicators of sample problems that, if left undetected, would inhibit discovery.


Asunto(s)
Genoma Humano , Aprendizaje Automático , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Cromosomas Humanos X/genética , Exoma , Femenino , Estudios de Asociación Genética , Sitios Genéticos , Genotipo , Humanos , Masculino , Linaje , Polimorfismo de Nucleótido Simple
15.
Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29873782

RESUMEN

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.


Asunto(s)
Genómica/métodos , Programas Informáticos , Inmunoprecipitación de Cromatina , Factor de Transcripción GATA1/metabolismo , Internet , Análisis de Secuencia de ADN , Interfaz Usuario-Computador
16.
Nat Methods ; 13(1): 63-5, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26550772

RESUMEN

Genotype Query Tools (GQT) is an indexing strategy that expedites analyses of genome-variation data sets in Variant Call Format based on sample genotypes, phenotypes and relationships. GQT's compressed genotype index minimizes decompression for analysis, and its performance relative to that of existing methods improves with cohort size. We show substantial (up to 443-fold) gains in performance over existing methods and demonstrate GQT's utility for exploring massive data sets involving thousands to millions of genomes. GQT can be accessed at https://github.com/ryanlayer/gqt.


Asunto(s)
Variación Genética , Genotipo , Conjuntos de Datos como Asunto
17.
Bioinformatics ; 34(5): 867-868, 2018 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-29096012

RESUMEN

Summary: Mosdepth is a new command-line tool for rapidly calculating genome-wide sequencing coverage. It measures depth from BAM or CRAM files at either each nucleotide position in a genome or for sets of genomic regions. Genomic regions may be specified as either a BED file to evaluate coverage across capture regions, or as a fixed-size window as required for copy-number calling. Mosdepth uses a simple algorithm that is computationally efficient and enables it to quickly produce coverage summaries. We demonstrate that mosdepth is faster than existing tools and provides flexibility in the types of coverage profiles produced. Availability and implementation: mosdepth is available from https://github.com/brentp/mosdepth under the MIT license. Contact: bpederse@gmail.com. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Genoma Humano , Genómica/métodos , Humanos , Secuenciación del Exoma/métodos
18.
Bioinformatics ; 34(19): 3387-3389, 2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29718142

RESUMEN

Motivation: Extracting biological insight from genomic data inevitably requires custom software. In many cases, this is accomplished with scripting languages, owing to their accessibility and brevity. Unfortunately, the ease of scripting languages typically comes at a substantial performance cost that is especially acute with the scale of modern genomics datasets. Results: We present hts-nim, a high-performance library written in the Nim programming language that provides a simple, scripting-like syntax without sacrificing performance. Availability and implementation: hts-nim is available at https://github.com/brentp/hts-nim and the example tools are at https://github.com/brentp/hts-nim-tools both under the MIT license.


Asunto(s)
Genómica , Lenguajes de Programación , Programas Informáticos , Biología Computacional
19.
Nat Methods ; 12(10): 966-8, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26258291

RESUMEN

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Variación Genética , Humanos , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Medicina de Precisión/métodos , Flujo de Trabajo
20.
Bioinformatics ; 33(12): 1867-1869, 2017 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-28165109

RESUMEN

MOTIVATION: Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files. RESULTS: We introduce cyvcf2 , a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility. CONTACT: bpederse@gmail.com or aaronquinlan@gmail.com. AVAILABILITY AND IMPLEMENTATION: cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/.


Asunto(s)
Variación Genética , Técnicas de Genotipaje/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Humanos , Metadatos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA