RESUMEN
BACKGROUND: Individual cells from isogenic populations often display large cell-to-cell differences in gene expression. This "noise" in expression derives from several sources, including the genomic and cellular environment in which a gene resides. Large-scale maps of genomic environments have revealed the effects of epigenetic modifications and transcription factor occupancy on mean expression levels, but leveraging such maps to explain expression noise will require new methods to assay how expression noise changes at locations across the genome. RESULTS: To address this gap, we present Single-cell Analysis of Reporter Gene Expression Noise and Transcriptome (SARGENT), a method that simultaneously measures the noisiness of reporter genes integrated throughout the genome and the global mRNA profiles of individual reporter-gene-containing cells. Using SARGENT, we perform the first comprehensive genome-wide survey of how genomic locations impact gene expression noise. We find that the mean and noise of expression correlate with different histone modifications. We quantify the intrinsic and extrinsic components of reporter gene noise and, using the associated mRNA profiles, assign the extrinsic component to differences between the CD24+ "stem-like" substate and the more "differentiated" substate. SARGENT also reveals the effects of transgene integrations on endogenous gene expression, which will help guide the search for "safe-harbor" loci. CONCLUSIONS: Taken together, we show that SARGENT is a powerful tool to measure both the mean and noise of gene expression at locations across the genome and that the data generatd by SARGENT reveals important insights into the regulation of gene expression noise genome-wide.
Asunto(s)
Análisis de la Célula Individual , Humanos , Genes Reporteros , Transcriptoma , Genómica/métodosRESUMEN
Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome. We address this problem using active machine learning to iteratively train models on multiple rounds of synthetic DNA sequences assayed in live mammalian retinas. During each round of training the model actively selects sequence perturbations to assay, thereby efficiently generating informative training data. We iteratively trained a model that predicts the activities of sequences containing binding motifs for the photoreceptor transcription factor Cone-rod homeobox (CRX) using an order of magnitude less training data than current approaches. The model's internal confidence estimates of its predictions are reliable guides for designing sequences with high activity. The model correctly identified critical sequence differences between active and inactive sequences with nearly identical transcription factor binding sites, and revealed order and spacing preferences for combinations of motifs. Our results establish active learning as an effective method to train accurate deep learning models of cis-regulatory function after exhausting naturally occurring training examples in the genome.
RESUMEN
Stochastic differences among clonal cells can initiate cell fate decisions in development or cause cell-to-cell differences in the responses to drugs or extracellular ligands. One hypothesis is that some of this phenotypic variability is caused by stochastic fluctuations in the activities of transcription factors (TFs). We tested this hypothesis in NIH3T3-CG cells using the response to Hedgehog signaling as a model cellular response. Here, we present evidence for the existence of distinct fast- and slow-responding substates in NIH3T3-CG cells. These two substates have distinct expression profiles, and fluctuations in the Prrx1 TF underlie some of the differences in expression and responsiveness between fast and slow cells. Our results show that fluctuations in TFs can contribute to cell-to-cell differences in Hedgehog signaling.
Asunto(s)
Proteínas Hedgehog , Factores de Transcripción , Animales , Ratones , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Proteínas Hedgehog/genética , Proteínas Hedgehog/metabolismo , Células 3T3 NIH , Regulación de la Expresión Génica , Transducción de SeñalRESUMEN
Postzygotic mutations (PZMs) begin to accrue in the human genome immediately after fertilization, but how and when PZMs affect development and lifetime health remain unclear. To study the origins and functional consequences of PZMs, we generated a multitissue atlas of PZMs spanning 54 tissue and cell types from 948 donors. Nearly half the variation in mutation burden among tissue samples can be explained by measured technical and biological effects, and 9% can be attributed to donor-specific effects. Through phylogenetic reconstruction of PZMs, we found that their type and predicted functional impact vary during prenatal development, across tissues, and through the germ cell life cycle. Thus, methods for interpreting effects across the body and the life span are needed to fully understand the consequences of genetic variants.
Asunto(s)
Análisis Mutacional de ADN , Longevidad , Cigoto , Femenino , Humanos , Longevidad/genética , Mutación , Filogenia , RNA-SeqRESUMEN
Somatic mutations within non-coding regions and even exons may have unidentified regulatory consequences that are often overlooked in analysis workflows. Here we present RegTools ( www.regtools.org ), a computationally efficient, free, and open-source software package designed to integrate somatic variants from genomic data with splice junctions from bulk or single cell transcriptomic data to identify variants that may cause aberrant splicing. We apply RegTools to over 9000 tumor samples with both tumor DNA and RNA sequence data. RegTools discovers 235,778 events where a splice-associated variant significantly increases the splicing of a particular junction, across 158,200 unique variants and 131,212 unique junctions. To characterize these somatic variants and their associated splice isoforms, we annotate them with the Variant Effect Predictor, SpliceAI, and Genotype-Tissue Expression junction counts and compare our results to other tools that integrate genomic and transcriptomic data. While many events are corroborated by the aforementioned tools, the flexibility of RegTools also allows us to identify splice-associated variants in known cancer drivers, such as TP53, CDKN2A, and B2M, and other genes.
Asunto(s)
Neoplasias , Transcriptoma , Humanos , Transcriptoma/genética , Genómica , Empalme del ARN/genética , Genoma , Neoplasias/genética , Empalme Alternativo/genéticaRESUMEN
Pathogenic variants in surfactant proteins SP-B and SP-C cause surfactant deficiency and interstitial lung disease. Surfactant proteins are synthesized as precursors (proSP-B, proSP-C), trafficked, and processed via a vesicular-regulated secretion pathway; however, control of vesicular trafficking events is not fully understood. Through the Undiagnosed Diseases Network, we evaluated a child with interstitial lung disease suggestive of surfactant deficiency. Variants in known surfactant dysfunction disorder genes were not found in trio exome sequencing. Instead, a de novo heterozygous variant in RAB5B was identified in the Ras/Rab GTPases family nucleotide binding domain, p.Asp136His. Functional studies were performed in Caenorhabditis elegans by knocking the proband variant into the conserved position (Asp135) of the ortholog, rab-5 Genetic analysis demonstrated that rab-5[Asp135His] is damaging, producing a strong dominant negative gene product. rab-5[Asp135His] heterozygotes were also defective in endocytosis and early endosome (EE) fusion. Immunostaining studies of the proband's lung biopsy revealed that RAB5B and EE marker EEA1 were significantly reduced in alveolar type II cells and that mature SP-B and SP-C were significantly reduced, while proSP-B and proSP-C were normal. Furthermore, staining normal lung showed colocalization of RAB5B and EEA1 with proSP-B and proSP-C. These findings indicate that dominant negative-acting RAB5B Asp136His and EE dysfunction cause a defect in processing/trafficking to produce mature SP-B and SP-C, resulting in interstitial lung disease, and that RAB5B and EEs normally function in the surfactant secretion pathway. Together, the data suggest a noncanonical function for RAB5B and identify RAB5B p.Asp136His as a genetic mechanism for a surfactant dysfunction disorder.
Asunto(s)
Variación Genética/genética , Precursores de Proteínas/genética , Proteína C Asociada a Surfactante Pulmonar/genética , Proteínas Asociadas a Surfactante Pulmonar/genética , Proteínas de Unión al GTP rab5/genética , Células Epiteliales Alveolares/metabolismo , Animales , Caenorhabditis elegans/genética , Humanos , Pulmón/metabolismo , Enfermedades Pulmonares Intersticiales/genética , Surfactantes Pulmonares/metabolismoRESUMEN
Miscarriage is a common, complex trait affecting ~15% of clinically confirmed pregnancies. Here we present the results of large-scale genetic association analyses with 69,054 cases from five different ancestries for sporadic miscarriage, 750 cases of European ancestry for multiple (≥3) consecutive miscarriage, and up to 359,469 female controls. We identify one genome-wide significant association (rs146350366, minor allele frequency (MAF) 1.2%, P = 3.2 × 10-8, odds ratio (OR) = 1.4) for sporadic miscarriage in our European ancestry meta-analysis and three genome-wide significant associations for multiple consecutive miscarriage (rs7859844, MAF = 6.4%, P = 1.3 × 10-8, OR = 1.7; rs143445068, MAF = 0.8%, P = 5.2 × 10-9, OR = 3.4; rs183453668, MAF = 0.5%, P = 2.8 × 10-8, OR = 3.8). We further investigate the genetic architecture of miscarriage with biobank-scale Mendelian randomization, heritability, and genetic correlation analyses. Our results show that miscarriage etiopathogenesis is partly driven by genetic variation potentially related to placental biology, and illustrate the utility of large-scale biobank data for understanding this pregnancy complication.
Asunto(s)
Aborto Habitual/genética , Aborto Espontáneo/genética , Predisposición Genética a la Enfermedad , Placenta/fisiopatología , Aborto Habitual/epidemiología , Aborto Habitual/fisiopatología , Aborto Espontáneo/epidemiología , Aborto Espontáneo/fisiopatología , Adulto , Anciano , Estudios de Casos y Controles , Conjuntos de Datos como Asunto , Femenino , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Humanos , Patrón de Herencia , Anamnesis , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Embarazo , Población Blanca/genética , Adulto JovenRESUMEN
Nearly all patients with small cell lung cancer (SCLC) eventually relapse with chemoresistant disease. The molecular mechanisms driving chemoresistance in SCLC remain un-characterized. Here, we describe whole-exome sequencing of paired SCLC tumor samples procured at diagnosis and relapse from 12 patients, and unpaired relapse samples from 18 additional patients. Multiple somatic copy number alterations, including gains in ABCC1 and deletions in MYCL, MSH2, and MSH6, are identifiable in relapsed samples. Relapse samples also exhibit recurrent mutations and loss of heterozygosity in regulators of WNT signaling, including CHD8 and APC. Analysis of RNA-sequencing data shows enrichment for an ASCL1-low expression subtype and WNT activation in relapse samples. Activation of WNT signaling in chemosensitive human SCLC cell lines through APC knockdown induces chemoresistance. Additionally, in vitro-derived chemoresistant cell lines demonstrate increased WNT activity. Overall, our results suggest WNT signaling activation as a mechanism of chemoresistance in relapsed SCLC.
Asunto(s)
Resistencia a Antineoplásicos/genética , Neoplasias Pulmonares/genética , Carcinoma Pulmonar de Células Pequeñas/genética , Vía de Señalización Wnt/genética , Proteína de la Poliposis Adenomatosa del Colon/genética , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Cadherinas/genética , Línea Celular Tumoral , Resistencia a Antineoplásicos/efectos de los fármacos , Regulación Neoplásica de la Expresión Génica , Técnicas de Silenciamiento del Gen , Humanos , Pérdida de Heterocigocidad , Neoplasias Pulmonares/tratamiento farmacológico , Neoplasias Pulmonares/patología , Mutación , Recurrencia Local de Neoplasia , Carcinoma Pulmonar de Células Pequeñas/tratamiento farmacológico , Carcinoma Pulmonar de Células Pequeñas/patología , Secuenciación del Exoma , Vía de Señalización Wnt/efectos de los fármacosRESUMEN
Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules. Although many editing sites have recently been discovered, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing.
Asunto(s)
Adenosina Desaminasa , Primates/genética , Edición de ARN/genética , Proteínas de Unión al ARN , Adenosina Desaminasa/genética , Adenosina Desaminasa/metabolismo , Animales , Femenino , Genotipo , Células HEK293 , Humanos , Masculino , Ratones , Músculos/metabolismo , Proteínas Nucleares/metabolismo , Especificidad de Órganos/genética , Proteolisis , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Análisis Espacio-Temporal , Especificidad de la Especie , Transcriptoma/genéticaRESUMEN
The genomic events responsible for the pathogenesis of relapsed adult B-lymphoblastic leukemia (B-ALL) are not yet clear. We performed integrative analysis of whole-genome, whole-exome, custom capture, whole-transcriptome (RNA-seq), and locus-specific genomic assays across nine time points from a patient with primary de novo B-ALL. Comprehensive genome and transcriptome characterization revealed a dramatic tumor evolution during progression, yielding a tumor with complex clonal architecture at second relapse. We observed and validated point mutations in EP300 and NF1, a highly expressed EP300-ZNF384 gene fusion, a microdeletion in IKZF1, a focal deletion affecting SETD2, and large deletions affecting RB1, PAX5, NF1, and ETV6. Although the genome analysis revealed events of potential biological relevance, no clinically actionable treatment options were evident at the time of the second relapse. However, transcriptome analysis identified aberrant overexpression of the targetable protein kinase encoded by the FLT3 gene. Although the patient had refractory disease after salvage therapy for the second relapse, treatment with the FLT3 inhibitor sunitinib rapidly induced a near complete molecular response, permitting the patient to proceed to a matched-unrelated donor stem cell transplantation. The patient remains in complete remission more than 4 years later. Analysis of this patient's relapse genome revealed an unexpected, actionable therapeutic target that led to a specific therapy associated with a rapid clinical response. For some patients with relapsed or refractory cancers, this approach may indicate a novel therapeutic intervention that could alter outcome.
Asunto(s)
Genómica , Leucemia-Linfoma Linfoblástico de Células Precursoras B/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras B/terapia , Activación Transcripcional , Tirosina Quinasa 3 Similar a fms/genética , Adulto , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Biopsia , Médula Ósea/patología , Trasplante de Médula Ósea , Ciclofosfamida/uso terapéutico , Análisis Citogenético , Dexametasona/uso terapéutico , Doxorrubicina/uso terapéutico , Citometría de Flujo , Perfilación de la Expresión Génica , Variación Genética , Genómica/métodos , Enfermedad Injerto contra Huésped/tratamiento farmacológico , Enfermedad Injerto contra Huésped/etiología , Humanos , Masculino , Leucemia-Linfoma Linfoblástico de Células Precursoras B/diagnóstico , Recurrencia , Trasplante Homólogo , Vincristina/uso terapéuticoRESUMEN
PURPOSE: This trial was conducted to determine the maximum tolerated dose (MTD) and preliminary efficacy of buparlisib, an oral pan-class I PI3K inhibitor, plus fulvestrant in postmenopausal women with metastatic estrogen receptor positive (ER(+)) breast cancer. EXPERIMENTAL DESIGN: Phase IA employed a 3+3 design to determine the MTD of buparlisib daily plus fulvestrant. Subsequent cohorts (phase IB and cohort C) evaluated intermittent (5/7-day) and continuous dosing of buparlisib (100 mg daily). No more than 3 prior systemic treatments in the metastatic setting were allowed in these subsequent cohorts. RESULTS: Thirty-one patients were enrolled. MTD was defined as buparlisib 100 mg daily plus fulvestrant. Common adverse events (AE) included fatigue (38.7%), transaminases elevation (35.5%), rash (29%), and diarrhea (19.4%). C-peptide was significantly increased during treatment, consistent with on-target effect of buparlisib. Compared with intermittent dosing, daily buparlisib was associated with more frequent early onset AEs and higher buparlisib plasma concentrations. Among the 29 evaluable patients, the clinical benefit rate was 58.6% (95% CI, 40.7%-74.5%). Response was not associated with PIK3CA mutation or treatment cohort; however, loss of PTEN, progesterone receptor (PgR) expression, or mutation in TP53 was most common in resistant cases, and mutations inAKT1 and ESR1 did not exclude treatment response. CONCLUSIONS: Buparlisib plus fulvestrant is clinically active with manageable AEs in patients with metastatic ER(+)breast cancer. Weekend breaks in buparlisib dosing reduced toxicity. Patients with PgR negative and TP53 mutation did poorly, suggesting buparlisib plus fulvestrant may not be adequately effective against tumors with these poor prognostic molecular features.
Asunto(s)
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/metabolismo , Posmenopausia , Receptores de Estrógenos/metabolismo , Adulto , Anciano , Aminopiridinas/administración & dosificación , Aminopiridinas/farmacocinética , Protocolos de Quimioterapia Combinada Antineoplásica/efectos adversos , Biomarcadores de Tumor , Neoplasias de la Mama/patología , Estradiol/administración & dosificación , Estradiol/análogos & derivados , Estradiol/farmacocinética , Femenino , Fulvestrant , Humanos , Persona de Mediana Edad , Morfolinas/administración & dosificación , Morfolinas/farmacocinética , Metástasis de la Neoplasia , Fosfohidrolasa PTEN , Inhibidores de las Quinasa Fosfoinosítidos-3 , Receptores de Progesterona , Resultado del TratamientoRESUMEN
Tumors are typically sequenced to depths of 75-100× (exome) or 30-50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159).
RESUMEN
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
Asunto(s)
Mapeo Cromosómico/métodos , Genoma Humano/genética , Bases del Conocimiento , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Interfaz Usuario-Computador , Algoritmos , Simulación por Computador , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Humanos , Alineación de Secuencia/métodosRESUMEN
Broad and deep tumour genome sequencing has shed new light on tumour heterogeneity and provided important insights into the evolution of metastases arising from different clones. There is an additional layer of complexity, in that tumour evolution may be influenced by selective pressure provided by therapy, in a similar fashion to that occurring in infectious diseases. Here we studied tumour genomic evolution in a patient (index patient) with metastatic breast cancer bearing an activating PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha, PI(3)Kα) mutation. The patient was treated with the PI(3)Kα inhibitor BYL719, which achieved a lasting clinical response, but the patient eventually became resistant to this drug (emergence of lung metastases) and died shortly thereafter. A rapid autopsy was performed and material from a total of 14 metastatic sites was collected and sequenced. All metastatic lesions, when compared to the pre-treatment tumour, had a copy loss of PTEN (phosphatase and tensin homolog) and those lesions that became refractory to BYL719 had additional and different PTEN genetic alterations, resulting in the loss of PTEN expression. To put these results in context, we examined six other patients also treated with BYL719. Acquired bi-allelic loss of PTEN was found in one of these patients, whereas in two others PIK3CA mutations present in the primary tumour were no longer detected at the time of progression. To characterize our findings functionally, we examined the effects of PTEN knockdown in several preclinical models (both in cell lines intrinsically sensitive to BYL719 and in PTEN-null xenografts derived from our index patient), which we found resulted in resistance to BYL719, whereas simultaneous PI(3)K p110ß blockade reverted this resistance phenotype. We conclude that parallel genetic evolution of separate metastatic sites with different PTEN genomic alterations leads to a convergent PTEN-null phenotype resistant to PI(3)Kα inhibition.
Asunto(s)
Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Resistencia a Antineoplásicos/genética , Fosfohidrolasa PTEN/deficiencia , Fosfohidrolasa PTEN/genética , Inhibidores de las Quinasa Fosfoinosítidos-3 , Tiazoles/farmacología , Alelos , Animales , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Fosfatidilinositol 3-Quinasa Clase I , Resistencia a Antineoplásicos/efectos de los fármacos , Femenino , Humanos , Pérdida de Heterocigocidad/efectos de los fármacos , Pérdida de Heterocigocidad/genética , Ratones , Ratones Desnudos , Fosfohidrolasa PTEN/metabolismo , Tiazoles/uso terapéutico , Ensayos Antitumor por Modelo de XenoinjertoRESUMEN
We present DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis and fragment information to identify the parental origin of germ-line mutations. We used DeNovoGear on human whole-genome sequencing data to produce a set of predicted de novo insertion and/or deletion (indel) mutations with a 95% validation rate.
Asunto(s)
Genoma Humano/genética , Mutación INDEL , Modelos Genéticos , Mutación Puntual , Programas Informáticos , Exoma , Eliminación de Gen , Proyecto Genoma Humano , Humanos , Funciones de Verosimilitud , Mutagénesis InsercionalRESUMEN
Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized that men with spermatogenic impairment, a disease with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. After assaying genomewide SNPs and CNVs in 323 Caucasian men with idiopathic spermatogenic impairment and more than 1,100 controls, we estimate that each rare autosomal deletion detected in our study multiplicatively changes a man's risk of disease by 10% (OR 1.10 [1.04-1.16], p<2 × 10(-3)), rare X-linked CNVs by 29%, (OR 1.29 [1.11-1.50], p<1 × 10(-3)), and rare Y-linked duplications by 88% (OR 1.88 [1.13-3.13], p<0.03). By contrasting the properties of our case-specific CNVs with those of CNV callsets from cases of autism, schizophrenia, bipolar disorder, and intellectual disability, we propose that the CNV burden in spermatogenic impairment is distinct from the burden of large, dominant mutations described for neurodevelopmental disorders. We identified two patients with deletions of DMRT1, a gene on chromosome 9p24.3 orthologous to the putative sex determination locus of the avian ZW chromosome system. In an independent sample of Han Chinese men, we identified 3 more DMRT1 deletions in 979 cases of idiopathic azoospermia and none in 1,734 controls, and found none in an additional 4,519 controls from public databases. The combined results indicate that DMRT1 loss-of-function mutations are a risk factor and potential genetic cause of human spermatogenic failure (frequency of 0.38% in 1306 cases and 0% in 7,754 controls, p = 6.2 × 10(-5)). Our study identifies other recurrent CNVs as potential causes of idiopathic azoospermia and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes.
Asunto(s)
Cromosomas Humanos X , Cromosomas Humanos Y , Infertilidad Masculina/genética , Factores de Transcripción/genética , Pueblo Asiatico/genética , Azoospermia/genética , Azoospermia/fisiopatología , Variaciones en el Número de Copia de ADN , Femenino , Fertilización In Vitro , Humanos , Infertilidad Masculina/fisiopatología , Masculino , Mutación , Embarazo , Proteínas de Plasma Seminal , Eliminación de Secuencia , Espermatogénesis/genéticaRESUMEN
BACKGROUND: We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. RESULTS: We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. CONCLUSIONS: Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.