RESUMO
Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.
Assuntos
Bases de Dados Genéticas , RNA Longo não Codificante/química , RNA Longo não Codificante/genética , Transcriptoma/genética , Células Cultivadas , Sequência Conservada/genética , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos/genética , Epigênese Genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Genômica , Humanos , Internet , Anotação de Sequência Molecular , Especificidade de Órgãos/genética , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas/genética , Locos de Características Quantitativas/genética , Estabilidade de RNA , RNA Mensageiro/genéticaRESUMO
A precise genetic diagnosis is the single most important step for families with genetic disorders to enable personalized and preventative medicine. In addition to genetic variants in coding regions (exons) that can change a protein sequence, abnormal pre-mRNA splicing can be devastating for the encoded protein, inducing a frameshift or in-frame deletion/insertion of multiple residues. Non-coding variants that disrupt splicing are extremely challenging to identify. Stemming from an initial clinical discovery in two index Australian families, we define 25 families with genetic disorders caused by a class of pathogenic non-coding splice variant due to intronic deletions. These pathogenic intronic deletions spare all consensus splice motifs, though they critically shorten the minimal distance between the 5' splice-site (5'SS) and branchpoint. The mechanistic basis for abnormal splicing is due to biophysical constraint precluding U1/U2 spliceosome assembly, which stalls in A-complexes (that bridge the 5'SS and branchpoint). Substitution of deleted nucleotides with non-specific sequences restores spliceosome assembly and normal splicing, arguing against loss of an intronic element as the primary causal basis. Incremental lengthening of 5'SS-branchpoint length in our index EMD case subject defines 45-47 nt as the critical elongation enabling (inefficient) spliceosome assembly for EMD intron 5. The 5'SS-branchpoint space constraint mechanism, not currently factored by genomic informatics pipelines, is relevant to diagnosis and precision medicine across the breadth of Mendelian disorders and cancer genomics.
Assuntos
Íntrons , Splicing de RNA , Spliceossomos , Adolescente , Adulto , Fenômenos Biofísicos , Criança , Feminino , Humanos , Lactente , Masculino , Pessoa de Meia-Idade , LinhagemRESUMO
Pyrenophora teres f. teres and P. teres f. maculata cause net form and spot form, respectively, of net blotch on barley (Hordeum vulgare). The two forms reproduce sexually, producing hybrids with genetic and pathogenic variability. Phenotypic identification of hybrids is challenging because lesions induced by hybrids on host plants resemble lesions induced by either P. teres f. teres or P. teres f. maculata. In this study, 12 sequence-specific polymerase chain reaction markers were developed based on expressed regions spread across the genome. The primers were validated using 210 P. teres isolates, 2 putative field hybrids (WAC10721 and SNB172), 50 laboratory-produced hybrids, and 7 isolates collected from barley grass (H. leporinum). The sequence-specific markers confirmed isolate WAC10721 as a hybrid. Only four P. teres f. teres markers amplified on DNA of barley grass isolates. Amplified fragment length polymorphism markers suggested that P. teres barley grass isolates are genetically different from P. teres barley isolates and that the second putative hybrid (SNB172) is a barley grass isolate. We developed a suite of markers which clearly distinguish the two forms of P. teres and enable unambiguous identification of hybrids.
Assuntos
Ascomicetos/genética , Doenças das Plantas/microbiologia , Austrália , Marcadores Genéticos , Hordeum/microbiologia , Hibridização Genética , Reação em Cadeia da Polimerase , África do SulRESUMO
BACKGROUND: The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. RESULTS: CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. CONCLUSIONS: We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.
Assuntos
Perfilação da Expressão Gênica , Genoma Fúngico , Anotação de Sequência Molecular/métodos , Análise de Sequência de RNA , Software , Genes Fúngicos , Cadeias de Markov , Modelos Genéticos , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genéticaRESUMO
Ovine congenital progressive muscular dystrophy (OCPMD) was first described in Merino sheep flocks in Queensland and Western Australia in the 1960s and 1970s. The most prominent feature of the disease is a distinctive gait with stiffness of the hind limbs that can be seen as early as 3 weeks after birth. The disease is progressive. Histopathological examination had revealed dystrophic changes specifically in type I (slow) myofibres, while electron microscopy had demonstrated abundant nemaline bodies. Therefore, it was never certain whether the disease was a dystrophy or a congenital myopathy with dystrophic features. In this study, we performed whole genome sequencing of OCPMD sheep and identified a single base deletion at the splice donor site (+ 1) of intron 13 in the type I myofibre-specific TNNT1 gene (KT218690 c.614 + 1delG). All affected sheep were homozygous for this variant. Examination of TNNT1 splicing by RT-PCR showed intron retention and premature termination, which disrupts the highly conserved 14 amino acid C-terminus. The variant did not reduce TNNT1 protein levels or affect its localization but impaired its ability to modulate muscle contraction in response to Ca2+ levels. Identification of the causative variant in TNNT1 finally clarifies that the OCPMD sheep is in fact a large animal model of TNNT1 congenital myopathy. This model could now be used for testing molecular or gene therapies.
Assuntos
Miotonia Congênita/patologia , Miotonia Congênita/veterinária , Doenças dos Ovinos/genética , Doenças dos Ovinos/patologia , Troponina T/genética , Animais , Modelos Animais de Doenças , Músculo Esquelético/patologia , OvinosRESUMO
We present a novel method to measure the local GC-content bias in genomes and a survey of published fungal species. The method, enacted as "OcculterCut" (https://sourceforge.net/projects/occultercut, last accessed April 30, 2016), identified species containing distinct AT-rich regions. In most fungal taxa, AT-rich regions are a signature of repeat-induced point mutation (RIP), which targets repetitive DNA and decreases GC-content though the conversion of cytosine to thymine bases. RIP has in turn been identified as a driver of fungal genome evolution, as RIP mutations can also occur in single-copy genes neighboring repeat-rich regions. Over time RIP perpetuates "two speeds" of gene evolution in the GC-equilibrated and AT-rich regions of fungal genomes. In this study, genomes showing evidence of this process are found to be common, particularly among the Pezizomycotina. Further analysis highlighted differences in amino acid composition and putative functions of genes from these regions, supporting the hypothesis that these regions play an important role in fungal evolution. OcculterCut can also be used to identify genes undergoing RIP-assisted diversifying selection, such as small, secreted effector proteins that mediate host-microbe disease interactions.
Assuntos
Sequência Rica em At/genética , Ascomicetos/genética , Evolução Molecular , Genoma Fúngico , Elementos de DNA Transponíveis/genética , DNA Fúngico/genética , Mutação , FilogeniaRESUMO
The interactions between fungi and plants encompass a spectrum of ecologies ranging from saprotrophy (growth on dead plant material) through pathogenesis (growth of the fungus accompanied by disease on the plant) to symbiosis (growth of the fungus with growth enhancement of the plant). We consider pathogenesis in this article and the key roles played by a range of pathogen-encoded molecules that have collectively become known as effectors.
Assuntos
Fungos/metabolismo , Fungos/patogenicidade , Interações Hospedeiro-Patógeno , Doenças das Plantas/microbiologia , Fatores de Virulência/metabolismoRESUMO
Parastagonospora nodorum, the causal agent of Septoria nodorum blotch (SNB), is an economically important pathogen of wheat (Triticum spp.), and a model for the study of necrotrophic pathology and genome evolution. The reference P. nodorum strain SN15 was the first Dothideomycete with a published genome sequence, and has been used as the basis for comparison within and between species. Here we present an updated reference genome assembly with corrections of SNP and indel errors in the underlying genome assembly from deep resequencing data as well as extensive manual annotation of gene models using transcriptomic and proteomic sources of evidence (https://github.com/robsyme/Parastagonospora_nodorum_SN15). The updated assembly and annotation includes 8,366 genes with modified protein sequence and 866 new genes. This study shows the benefits of using a wide variety of experimental methods allied to expert curation to generate a reliable set of gene models.