Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Cell ; 180(5): 915-927.e16, 2020 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-32084333

RESUMEN

The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (∼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.


Asunto(s)
Genoma Humano/genética , Genómica/métodos , Mutación/genética , Neoplasias/genética , Análisis Mutacional de ADN/métodos , Progresión de la Enfermedad , Humanos , Neoplasias/patología , Secuenciación Completa del Genoma
2.
BMC Bioinformatics ; 21(1): 227, 2020 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-32498674

RESUMEN

BACKGROUND: Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many of the same single nucleotide variants (SNVs) are shared between germline and somatic mutation databases, such as between the gnomAD database of 120,000 germline exomes and the TCGA database of 10,000 somatic exomes. Here, we sought to explain this overlap. RESULTS: After strict filtering to exclude common germline polymorphisms and sites with poor coverage or mappability, we found 336,987 variants shared between the somatic and germline databases. A uniform statistical model explains 34% of these shared variants; a model that incorporates the varying mutation rates of the basic mutation types explains another 50% of shared variants; and a model that includes extended nucleotide contexts (e.g. surrounding 3 bases on either side) explains an additional 4% of shared variants. Analysis of read depth finds mixed evidence that up to 4% of the shared variants may represent germline variants leaked into somatic call sets. 9% of the shared variants are not explained by any model. Sequencing errors and convergent evolution did not account for these. We surveyed other factors as well: Cancers driven by endogenous mutational processes share a greater fraction of variants with the germline, and recently derived germline variants were more likely to be somatically shared than were ancient germline ones. CONCLUSIONS: Overall, we find that shared variants largely represent bona fide biological occurrences of the same variant in the germline and somatic setting and arise primarily because DNA has some of the same basic chemical vulnerabilities in either setting. Moreover, we find mixed evidence that somatic call-sets leak appreciable numbers of germline variants, which is relevant to genomic privacy regulations. In future studies, the similar chemical vulnerability of DNA between the somatic and germline settings might be used to help identify disease-related genes by guiding the development of background-mutation models that are informed by both somatic and germline patterns of variation.


Asunto(s)
Bases de Datos Genéticas , Mutación de Línea Germinal/genética , Alelos , Evolución Biológica , Epigénesis Genética , Frecuencia de los Genes/genética , Humanos , Tasa de Mutación , Neoplasias/genética , Nucleótidos/genética , Filogenia , Análisis de Secuencia de ADN
3.
PLoS Comput Biol ; 11(4): e1004132, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25884877

RESUMEN

The topology of the gene-regulatory network has been extensively analyzed. Now, given the large amount of available functional genomic data, it is possible to go beyond this and systematically study regulatory circuits in terms of logic elements. To this end, we present Loregic, a computational method integrating gene expression and regulatory network data, to characterize the cooperativity of regulatory factors. Loregic uses all 16 possible two-input-one-output logic gates (e.g. AND or XOR) to describe triplets of two factors regulating a common target. We attempt to find the gate that best matches each triplet's observed gene expression pattern across many conditions. We make Loregic available as a general-purpose tool (github.com/gersteinlab/loregic). We validate it with known yeast transcription-factor knockout experiments. Next, using human ENCODE ChIP-Seq and TCGA RNA-Seq data, we are able to demonstrate how Loregic characterizes complex circuits involving both proximally and distally regulating transcription factors (TFs) and also miRNAs. Furthermore, we show that MYC, a well-known oncogenic driving TF, can be modeled as acting independently from other TFs (e.g., using OR gates) but antagonistically with repressing miRNAs. Finally, we inter-relate Loregic's gate logic with other aspects of regulation, such as indirect binding via protein-protein interactions, feed-forward loop motifs and global regulatory hierarchy.


Asunto(s)
Redes Reguladoras de Genes/genética , Genes Reguladores/genética , Modelos Logísticos , Modelos Genéticos , Factores de Transcripción/genética , Activación Transcripcional/genética , Algoritmos , Animales , Simulación por Computador , Regulación de la Expresión Génica/genética , Humanos , Leucemia/genética , MicroARNs/genética
4.
PLoS One ; 18(9): e0291173, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37682908

RESUMEN

Encephalomyelitis/chronic fatigue syndrome (ME/CFS) and long COVID share some clinical and social characteristics. We predicted that this would lead to an increased interaction between pre-pandemic members of an ME/CFS online support community and a long COVID community. We performed a mixed-methods retrospective observational study of the Reddit activity of 7,544 users active on Reddit's long COVID forum. From among 1600 forums, pre-pandemic activity specifically on a ME/CFS forum is the top predictor of later participation on the long COVID forum versus an acute COVID support forum. In the qualitative portion, motives for this co-participation included seeking mutual support and dual identification with both conditions. Some of this effect may be explained by pre-existing ME/CFS possibly being a risk factor for long COVID and/or SARS-CoV-2 infection being a cause of ME/CFS relapse. The high rate of ME/CFS patients seeking mutual support on a long COVID forum speaks to the long-suffering experience of these patients not feeling heard or respected, and the hope of some ME/CFS patients to gain legitimacy through the public's growing recognition of long COVID.


Asunto(s)
COVID-19 , Síndrome de Fatiga Crónica , Humanos , Síndrome de Fatiga Crónica/epidemiología , Síndrome Post Agudo de COVID-19 , Pandemias , COVID-19/epidemiología , SARS-CoV-2
5.
JAMA Netw Open ; 6(6): e2317714, 2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37294568

RESUMEN

Importance: Major depressive disorder (MDD) is a leading cause of global distress and disability. Earlier studies have indicated that antidepressant therapy confers a modest reduction in depressive symptoms on average, but the distribution of this reduction requires more research. Objective: To estimate the distribution of antidepressant response by depression severity. Design, Setting, and Participants: In this secondary analysis of pooled trial data, quantile treatment effect (QTE) analysis was conducted from the US Food and Drug Administration (FDA) database of antidepressant monotherapy for patients with MDD, encompassing 232 positive and negative trials submitted to the FDA between 1979 and 2016. Analysis was restricted to participants with severe MDD (17-item Hamilton Rating Scale for Depression [HAMD-17] score ≥20). Data analysis was conducted from August 16, 2022, to April 16, 2023. Intervention: Antidepressant monotherapy compared with placebo. Main Outcomes and Measures: The distribution of percentage depression response was compared between the pooled treatment arm and pooled placebo arm. Percentage depression response was defined as 1 minus the ratio of final depression severity to baseline depression severity, expressed as a percentage. Depression severity was reported in HAMD-17-equivalent units. Results: A total of 57 313 participants with severe depression were included in the analysis. There was no significant imbalance in baseline depression severity between the pooled treatment arm and pooled placebo arm, with a mean HAMD-17 difference of 0.037 points (P = .11 by Wilcoxon rank sum test). An interaction term test for rank similarity did not reject the rank similarity governing percentage depression response (P > .99). The entire distribution of depression response was more favorable in the pooled treatment arm than in the pooled placebo arm. The maximum separation between treatment and placebo occurred at the 55th quantile and corresponded to an absolute improvement in depression due to active drug of 13.5% (95% CI, 12.4%-14.4%). The separation between treatment and placebo diminished near the tails of the distribution. Conclusions and Relevance: In this QTE analysis of pooled clinical trial data from the FDA, antidepressants were found to confer a small reduction in depression severity that was broadly distributed across participants with severe depression. Alternatively, if the assumptions behind the QTE analysis are not met, then the data are also compatible with antidepressants eliciting more complete response in a smaller subset of participants than is suggested by this QTE analysis.


Asunto(s)
Trastorno Depresivo Mayor , Humanos , Antidepresivos/uso terapéutico , Trastorno Depresivo Mayor/tratamiento farmacológico , Trastorno Depresivo Mayor/inducido químicamente , Ensayos Clínicos Controlados Aleatorios como Asunto , Inducción de Remisión , Estados Unidos , United States Food and Drug Administration
6.
JMIR Form Res ; 7: e38112, 2023 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-36649054

RESUMEN

BACKGROUND: Individuals with later bedtimes have an increased risk of difficulties with mood and substances. To investigate the causes and consequences of late bedtimes and other sleep patterns, researchers are exploring social media as a data source. Pioneering studies inferred sleep patterns directly from social media data. While innovative, these efforts are variously unscalable, context dependent, confined to specific sleep parameters, or rest on untested assumptions, and none of the reviewed studies apply to the popular Reddit platform or release software to the research community. OBJECTIVE: This study builds on this prior work. We estimate the bedtimes of Reddit users from the times tamps of their posts, test inference validity against survey data, and release our model as an R package (The R Foundation). METHODS: We included 159 sufficiently active Reddit users with known time zones and known, nonanomalous bedtimes, together with the time stamps of their 2.1 million posts. The model's form was chosen by visualizing the aggregate distribution of the timing of users' posts relative to their reported bedtimes. The chosen model represents a user's frequency of Reddit posting by time of day, with a flat portion before bedtime and a quadratic depletion that begins near the user's bedtime, with parameters fitted to the data. This model estimates the bedtimes of individual Reddit users from the time stamps of their posts. Model performance is assessed through k-fold cross-validation. We then apply the model to estimate the bedtimes of 51,372 sufficiently active, nonbot Reddit users with known time zones from the time stamps of their 140 million posts. RESULTS: The Pearson correlation between expected and observed Reddit posting frequencies in our model was 0.997 on aggregate data. On average, posting starts declining 45 minutes before bedtime, reaches a nadir 4.75 hours after bedtime that is 87% lower than the daytime rate, and returns to baseline 10.25 hours after bedtime. The Pearson correlation between inferred and reported bedtimes for individual users was 0.61 (P<.001). In 90 of 159 cases (56.6%), our estimate was within 1 hour of the reported bedtime; 128 cases (80.5%) were within 2 hours. There was equivalent accuracy in hold-out sets versus training sets of k-fold cross-validation, arguing against overfitting. The model was more accurate than a random forest approach. CONCLUSIONS: We uncovered a simple, reproducible relationship between Reddit users' reported bedtimes and the time of day when high daytime posting rates transition to low nighttime posting rates. We captured this relationship in a model that estimates users' bedtimes from the time stamps of their posts. Limitations include applicability only to users who post frequently, the requirement for time zone data, and limits on generalizability. Nonetheless, it is a step forward for inferring the sleep parameters of social media users passively at scale. Our model and precomputed estimated bedtimes of 50,000 Reddit users are freely available.

7.
Sleep Med ; 107: 212-218, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37235891

RESUMEN

Public health officials and clinicians routinely advise social media users to avoid nighttime social media use due to the perception that this delays the onset of sleep and predisposes to the health risks of insufficient sleep. With some exceptions, the evidence behind this advice mostly derives from surveys identifying an association between self-reported social media usage and self-reported sleep patterns. In principle, these associations could alternatively be explained by users turning to social media to pass the time when they are otherwise having difficulty sleeping, or by individual differences that draw some people to frequent social media use, or by offline activities that overlap with both social media use and delayed sleep. To attempt to distinguish among these explanations, we leveraged estimated bedtimes from 44,000 Reddit users reported in a recent study and their 120 million posts to test whether the relationship between sleep and social media has properties suggestive of a causal relationship. We find that users are especially likely to be active on Reddit after their bedtime (and therefore awake) on nights that they posted to Reddit shortly before bedtime, especially if they posted multiple times or in high-engagement forums that night. Overall, this study lends additional support to the notion that there likely is some causal effect of evening social media use on delayed sleep onset.


Asunto(s)
Trastornos del Sueño del Ritmo Circadiano , Medios de Comunicación Sociales , Adulto , Femenino , Humanos , Masculino , Adulto Joven , Ritmo Circadiano , Prevalencia , Autoinforme , Trastornos del Sueño del Ritmo Circadiano/epidemiología , Factores de Tiempo
8.
Sports Med ; 51(11): 2237-2250, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34468950

RESUMEN

Millions of consumer sport and fitness wearables (CSFWs) are used worldwide, and millions of datapoints are generated by each device. Moreover, these numbers are rapidly growing, and they contain a heterogeneity of devices, data types, and contexts for data collection. Companies and consumers would benefit from guiding standards on device quality and data formats. To address this growing need, we convened a virtual panel of industry and academic stakeholders, and this manuscript summarizes the outcomes of the discussion. Our objectives were to identify (1) key facilitators of and barriers to participation by CSFW manufacturers in guiding standards and (2) stakeholder priorities. The venues were the Yale Center for Biomedical Data Science Digital Health Monthly Seminar Series (62 participants) and the New England Chapter of the American College of Sports Medicine Annual Meeting (59 participants). In the discussion, stakeholders outlined both facilitators of (e.g., commercial return on investment in device quality, lucrative research partnerships, and transparent and multilevel evaluation of device quality) and barriers (e.g., competitive advantage conflict, lack of flexibility in previously developed devices) to participation in guiding standards. There was general agreement to adopt Keadle et al.'s standard pathway for testing devices (i.e., benchtop, laboratory, field-based, implementation) without consensus on the prioritization of these steps. Overall, there was enthusiasm not to add prescriptive or regulatory steps, but instead create a networking hub that connects companies to consumers and researchers for flexible guidance navigating the heterogeneity, multi-tiered development, dynamicity, and nebulousness of the CSFW field.


Asunto(s)
Medicina Deportiva , Deportes , Dispositivos Electrónicos Vestibles , Consenso , Ejercicio Físico , Humanos
9.
Nat Commun ; 11(1): 732, 2020 02 05.
Artículo en Inglés | MEDLINE | ID: mdl-32024824

RESUMEN

Tumors accumulate thousands of mutations, and sequencing them has given rise to methods for finding cancer drivers via mutational recurrence. However, these methods require large cohorts and underperform for low recurrence. Recently, ultra-deep sequencing has enabled accurate measurement of VAFs (variant-allele frequencies) for mutations, allowing the determination of evolutionary trajectories. Here, based solely on the VAF spectrum for an individual sample, we report on a method that identifies drivers and quantifies tumor growth. Drivers introduce perturbations into the spectrum, and our method uses the frequency of hitchhiking mutations preceding a driver to measure this. As validation, we use simulation models and 993 tumors from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium with previously identified drivers. Then we apply our method to an ultra-deep sequenced acute myeloid leukemia (AML) tumor and identify known cancer genes and additional driver candidates. In summary, our framework presents opportunities for personalized driver diagnosis using sequencing data from a single individual.


Asunto(s)
Genes Supresores de Tumor , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patología , Modelos Genéticos , Mutación , Algoritmos , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Tasa de Mutación , Mutación Missense , Neoplasias/genética , Neoplasias/patología , Oncogenes , Medicina de Precisión , Procesos Estocásticos
10.
Nat Commun ; 11(1): 4748, 2020 09 21.
Artículo en Inglés | MEDLINE | ID: mdl-32958763

RESUMEN

The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts.


Asunto(s)
Genoma Humano/genética , Mutación , Neoplasias/genética , Composición de Base , ADN Intergénico , Bases de Datos Genéticas , Exoma/genética , Exones , Humanos , Estudios Retrospectivos , Secuenciación del Exoma , Secuenciación Completa del Genoma
12.
Nat Commun ; 11(1): 3696, 2020 07 29.
Artículo en Inglés | MEDLINE | ID: mdl-32728046

RESUMEN

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.


Asunto(s)
Bases de Datos Genéticas , Genómica , Neoplasias/genética , Línea Celular Tumoral , Transformación Celular Neoplásica/genética , Redes Reguladoras de Genes , Humanos , Mutación/genética , Reproducibilidad de los Resultados , Factores de Transcripción/metabolismo
13.
Genome Biol ; 20(1): 109, 2019 05 29.
Artículo en Inglés | MEDLINE | ID: mdl-31142351

RESUMEN

Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural "exports" and "imports" between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.


Asunto(s)
Ciencia de los Datos , Genómica
14.
Structure ; 27(9): 1469-1481.e3, 2019 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-31279629

RESUMEN

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.


Asunto(s)
Biología Computacional/métodos , Polimorfismo de Nucleótido Simple , Proteínas/química , Proteínas/genética , Bases de Datos de Proteínas , Diseño de Fármacos , Humanos , Ligandos , Aprendizaje Automático , Modelos Estadísticos , Simulación del Acoplamiento Molecular , Unión Proteica , Conformación Proteica , Proteínas/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA