RESUMO
While the long noncoding RNAs (ncRNAs) constitute a large portion of the mammalian transcriptome, their biological functions has remained elusive. A few long ncRNAs that have been studied in any detail silence gene expression in processes such as X-inactivation and imprinting. We used a GENCODE annotation of the human genome to characterize over a thousand long ncRNAs that are expressed in multiple cell lines. Unexpectedly, we found an enhancer-like function for a set of these long ncRNAs in human cell lines. Depletion of a number of ncRNAs led to decreased expression of their neighboring protein-coding genes, including the master regulator of hematopoiesis, SCL (also called TAL1), Snai1 and Snai2. Using heterologous transcription assays we demonstrated a requirement for the ncRNAs in activation of gene expression. These results reveal an unanticipated role for a class of long ncRNAs in activation of critical regulators of development and differentiation.
Assuntos
Elementos Facilitadores Genéticos , Genoma Humano , RNA não Traduzido/metabolismo , Linhagem Celular , Linhagem Celular Tumoral , Células Cultivadas , Humanos , RNA Mensageiro/genética , Fatores de Transcrição da Família Snail , Fatores de Transcrição/genética , Ativação TranscricionalRESUMO
Histiocytic sarcoma (HS) is a rare but aggressive cancer in both humans and dogs. The spontaneous canine model, which has clinical, epidemiological, and histological similarities with human HS and specific breed predispositions, provides a unique opportunity to unravel the genetic basis of this cancer. In this study, we aimed to identify germline risk factors associated with the development of HS in canine-predisposed breeds. We used a methodology that combined several genome-wide association studies in a multi-breed and multi-cancer approach as well as targeted next-generation sequencing, and imputation We combined several dog breeds (Bernese mountain dogs, Rottweilers, flat-coated retrievers, and golden retrievers), and three hematopoietic cancers (HS, lymphoma, and mast cell tumor). Results showed that we not only refined the previously identified HS risk CDKN2A locus, but also identified new loci on canine chromosomes 2, 5, 14, and 20. Capture and targeted sequencing of specific loci suggested the existence of regulatory variants in non-coding regions and methylation mechanisms linked to risk haplotypes, which lead to strong cancer predisposition in specific dog breeds. We also showed that these canine cancer predisposing loci appeared to be due to the additive effect of several risk haplotypes involved in other hematopoietic cancers such as lymphoma or mast cell tumors as well. This illustrates the pleiotropic nature of these canine cancer loci as observed in human oncology, thereby reinforcing the interest of predisposed dog breeds to study cancer initiation and progression.
Assuntos
Inibidor p16 de Quinase Dependente de Ciclina/genética , Doenças do Cão/genética , Predisposição Genética para Doença , Neoplasias Hematológicas/genética , Sarcoma Histiocítico/genética , Animais , Mapeamento Cromossômico , Doenças do Cão/patologia , Cães , Estudo de Associação Genômica Ampla , Haplótipos/genética , Neoplasias Hematológicas/patologia , Neoplasias Hematológicas/veterinária , Sequenciamento de Nucleotídeos em Larga Escala , Sarcoma Histiocítico/patologia , HumanosRESUMO
Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3' end of LINE-1_Cfs (i.e., LINE-1_Cf 3'-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.
Assuntos
Cães/genética , Sequência Rica em GC , Genoma , Sequências Repetitivas Dispersas , Animais , Cães/classificação , Elementos Nucleotídeos Longos e Dispersos , Elementos Nucleotídeos Curtos e Dispersos , Especificidade da EspécieRESUMO
Animal genomes are pervasively transcribed into multiple RNA molecules, of which many will not be translated into proteins. One major component of this transcribed non-coding genome is the long non-coding RNAs (lncRNAs), which are defined as transcripts longer than 200 nucleotides with low coding-potential capabilities. Domestic animals constitute a unique resource for studying the genetic and epigenetic basis of phenotypic variations involving protein-coding and non-coding RNAs, such as lncRNAs. This review presents the current knowledge regarding transcriptome-based catalogues of lncRNAs in major domesticated animals (pets and livestock species), covering a broad phylogenetic scale (from dogs to chicken), and in comparison with human and mouse lncRNA catalogues. Furthermore, we describe different methods to extract known or discover novel lncRNAs and explore comparative genomics approaches to strengthen the annotation of lncRNAs. We then detail different strategies contributing to a better understanding of lncRNA functions, from genetic studies such as GWAS to molecular biology experiments and give some case examples in domestic animals. Finally, we discuss the limitations of current lncRNA annotations and suggest research directions to improve them and their functional characterisation.
Assuntos
RNA Longo não Codificante , Animais , Animais Domésticos/genética , Cães , Genoma , Gado/genética , Camundongos , Filogenia , RNA Longo não Codificante/genética , TranscriptomaRESUMO
In humans, histiocytic sarcoma (HS) is an aggressive cancer involving histiocytes. Its rarity and heterogeneity explain that treatment remains a challenge. Sharing high clinical and histopathological similarities with human HS, the canine HS is conversely frequent in specific breeds and thus constitutes a unique spontaneous model for human HS to decipher the genetic bases and to explore therapeutic options. We identified sequence alterations in the MAPK pathway in at least 63.9% (71/111) of HS cases with mutually exclusive BRAF (0.9%; 1/111), KRAS (7.2%; 8/111) and PTPN11 (56.75%; 63/111) mutations concentrated at hotspots common to human cancers. Recurrent PTPN11 mutations are associated to visceral disseminated HS subtype in dogs, the most aggressive clinical presentation. We then identified PTPN11 mutations in 3/19 (15.7%) human HS patients. Thus, we propose PTPN11 mutations as key events for a specific subset of human and canine HS: the visceral disseminated form. Finally, by testing drugs targeting the MAPK pathway in eight canine HS cell lines, we identified a better anti-proliferation activity of MEK inhibitors than PTPN11 inhibitors in canine HS neoplastic cells. In combination, these results illustrate the relevance of naturally affected dogs in deciphering genetic mechanisms and selecting efficient targeted therapies for such rare and aggressive cancers in humans.
Assuntos
Doenças do Cão/genética , Histiócitos/patologia , Sarcoma Histiocítico/genética , Inibidores de Proteínas Quinases/farmacologia , Proteína Tirosina Fosfatase não Receptora Tipo 11/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Biópsia , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Criança , Pré-Escolar , Análise Mutacional de DNA , Modelos Animais de Doenças , Doenças do Cão/sangue , Doenças do Cão/patologia , Cães , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Feminino , Sarcoma Histiocítico/tratamento farmacológico , Sarcoma Histiocítico/patologia , Sarcoma Histiocítico/veterinária , Humanos , Lactente , Sistema de Sinalização das MAP Quinases/efeitos dos fármacos , Sistema de Sinalização das MAP Quinases/genética , Masculino , Pessoa de Meia-Idade , Proteínas Quinases Ativadas por Mitógeno/antagonistas & inibidores , Proteínas Quinases Ativadas por Mitógeno/metabolismo , Mutação , Inibidores de Proteínas Quinases/uso terapêutico , Proteína Tirosina Fosfatase não Receptora Tipo 11/antagonistas & inibidores , Ribonucleases , Proteínas Supressoras de Tumor , Adulto JovemRESUMO
BACKGROUND: Comparative genomics studies are central in identifying the coding and non-coding elements associated with complex traits, and the functional annotation of genomes is a critical step to decipher the genotype-to-phenotype relationships in livestock animals. As part of the Functional Annotation of Animal Genomes (FAANG) action, the FR-AgENCODE project aimed to create reference functional maps of domesticated animals by profiling the landscape of transcription (RNA-seq), chromatin accessibility (ATAC-seq) and conformation (Hi-C) in species representing ruminants (cattle, goat), monogastrics (pig) and birds (chicken), using three target samples related to metabolism (liver) and immunity (CD4+ and CD8+ T cells). RESULTS: RNA-seq assays considerably extended the available catalog of annotated transcripts and identified differentially expressed genes with unknown function, including new syntenic lncRNAs. ATAC-seq highlighted an enrichment for transcription factor binding sites in differentially accessible regions of the chromatin. Comparative analyses revealed a core set of conserved regulatory regions across species. Topologically associating domains (TADs) and epigenetic A/B compartments annotated from Hi-C data were consistent with RNA-seq and ATAC-seq data. Multi-species comparisons showed that conserved TAD boundaries had stronger insulation properties than species-specific ones and that the genomic distribution of orthologous genes in A/B compartments was significantly conserved across species. CONCLUSIONS: We report the first multi-species and multi-assay genome annotation results obtained by a FAANG project. Beyond the generation of reference annotations and the confirmation of previous findings on model animals, the integrative analysis of data from multiple assays and species sheds a new light on the multi-scale selective pressure shaping genome organization from birds to mammals. Overall, these results emphasize the value of FAANG for research on domesticated animals and reinforces the importance of future meta-analyses of the reference datasets being generated by this community on different species.
Assuntos
Animais Domésticos/genética , Cromatina/genética , Anotação de Sequência Molecular , Transcriptoma , Animais , Bovinos , Galinhas , Cabras , Filogenia , Sus scrofaRESUMO
Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/FEELnc.
Assuntos
Genoma , Anotação de Sequência Molecular/métodos , RNA Longo não Codificante/genética , Software , Transcriptoma , Animais , Benchmarking , Árvores de Decisões , Cães , Regulação da Expressão Gênica , Humanos , Camundongos , Anotação de Sequência Molecular/estatística & dados numéricos , Fases de Leitura Aberta , RNA Longo não Codificante/classificação , RNA Longo não Codificante/metabolismo , RNA Mensageiro/classificação , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNARESUMO
Human Hereditary Sensory Autonomic Neuropathies (HSANs) are characterized by insensitivity to pain, sometimes combined with self-mutilation. Strikingly, several sporting dog breeds are particularly affected by such neuropathies. Clinical signs appear in young puppies and consist of acral analgesia, with or without sudden intense licking, biting and severe self-mutilation of the feet, whereas proprioception, motor abilities and spinal reflexes remain intact. Through a Genome Wide Association Study (GWAS) with 24 affected and 30 unaffected sporting dogs using the Canine HD 170K SNP array (Illumina), we identified a 1.8 Mb homozygous locus on canine chromosome 4 (adj. p-val = 2.5x10-6). Targeted high-throughput sequencing of this locus in 4 affected and 4 unaffected dogs identified 478 variants. Only one variant perfectly segregated with the expected recessive inheritance in 300 sporting dogs of known clinical status, while it was never present in 900 unaffected dogs from 130 other breeds. This variant, located 90 kb upstream of the GDNF gene, a highly relevant neurotrophic factor candidate gene, lies in a long intergenic non-coding RNAs (lincRNA), GDNF-AS. Using human comparative genomic analysis, we observed that the canine variant maps onto an enhancer element. Quantitative RT-PCR of dorsal root ganglia RNAs of affected dogs showed a significant decrease of both GDNF mRNA and GDNF-AS expression levels (respectively 60% and 80%), as compared to unaffected dogs. We thus performed gel shift assays (EMSA) that reveal that the canine variant significantly alters the binding of regulatory elements. Altogether, these results allowed the identification in dogs of GDNF as a relevant candidate for human HSAN and insensitivity to pain, but also shed light on the regulation of GDNF transcription. Finally, such results allow proposing these sporting dog breeds as natural models for clinical trials with a double benefit for human and veterinary medicine.
Assuntos
Fator Neurotrófico Derivado de Linhagem de Célula Glial/genética , Neuropatias Hereditárias Sensoriais e Autônomas/genética , Insensibilidade Congênita à Dor/genética , Dor/genética , RNA Longo não Codificante/genética , Animais , Mapeamento Cromossômico , Cães , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Neuropatias Hereditárias Sensoriais e Autônomas/fisiopatologia , Humanos , Dor/fisiopatologia , Insensibilidade Congênita à Dor/fisiopatologia , Mutação Puntual , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The self-assembly of nanoparticles in aqueous solutions promises wide applications but requires the careful balance of many parameters not present in organic solvents. While the presence of long-range electrostatic interactions in aqueous solutions may complicate such assemblies, they provide additional parameters through which to control self-assembly. Here, with DNA-capped gold nanoparticles and through the variation of the ionic strength in aqueous solutions, we explored the influence of electrostatic interactions on the adsorption of negatively charged nanoparticles on a positively charged surface. Specifically, we studied the kinetics of nanoparticle adsorption from solution using the quartz crystal microbalance with dissipation (QCM-D). We also characterized the structure of the adsorbed monolayers employing a combination of grazing incidence small-angle X-ray scattering (GISAXS) and scanning electron microscopy. We discovered that adsorption kinetics and monolayer structure were under the control of the DNA ligand length, solution ionic strength, and salt species. We also precisely fit the kinetics to a modified Langmuir model, which converged to the simple Langmuir model at high ionic strengths of magnesium chloride. We demonstrated that increasing the ionic strength and decreasing the DNA ligand lengths increased the surface coverage while decreasing the nanoparticle-nanoparticle spacing. The DNA-capped nanoparticle system reported here provides a readily applicable platform for controlling nanoparticle self-assembly in aqueous solution. Finally, we employ this tunability to create a system with a tunable plasmonic response. Our kinetics studies of the assembly process and further characterizations undertaken will facilitate the construction of nanoparticle arrays with precise structure, and such control will aid in the design of future plasmonic and optoelectronic devices.
Assuntos
DNA de Cadeia Simples/química , Ouro/química , Nanopartículas Metálicas/química , Adsorção , Cinética , Concentração Osmolar , Espalhamento a Baixo Ângulo , Eletricidade Estática , Difração de Raios XRESUMO
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
Assuntos
DNA/genética , Enciclopédias como Assunto , Genoma Humano/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Transcrição Gênica/genética , Transcriptoma/genética , Alelos , Linhagem Celular , DNA Intergênico/genética , Elementos Facilitadores Genéticos , Éxons/genética , Perfilação da Expressão Gênica , Genes/genética , Genômica , Humanos , Poliadenilação/genética , Isoformas de Proteínas/genética , RNA/biossíntese , RNA/genética , Edição de RNA/genética , Splicing de RNA/genética , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de RNARESUMO
The genome of the filamentous brown alga Ectocarpus was the first to be completely sequenced from within the brown algal group and has served as a key reference genome both for this lineage and for the stramenopiles. We present a complete structural and functional reannotation of the Ectocarpus genome. The large-scale assembly of the Ectocarpus genome was significantly improved and genome-wide gene re-annotation using extensive RNA-seq data improved the structure of 11 108 existing protein-coding genes and added 2030 new loci. A genome-wide analysis of splicing isoforms identified an average of 1.6 transcripts per locus. A large number of previously undescribed noncoding genes were identified and annotated, including 717 loci that produce long noncoding RNAs. Conservation of lncRNAs between Ectocarpus and another brown alga, the kelp Saccharina japonica, suggests that at least a proportion of these loci serve a function. Finally, a large collection of single nucleotide polymorphism-based markers was developed for genetic analyses. These resources are available through an updated and improved genome database. This study significantly improves the utility of the Ectocarpus genome as a high-quality reference for the study of many important aspects of brown algal biology and as a reference for genomic analyses across the stramenopiles.
Assuntos
DNA Intergênico/genética , Loci Gênicos , Genoma , Modelos Biológicos , Anotação de Sequência Molecular , Phaeophyceae/genética , Proteínas de Algas/genética , Proteínas de Algas/metabolismo , Processamento Alternativo/genética , Cromossomos de Plantas/genética , Sequência Conservada/genética , Bases de Dados Genéticas , Genoma Viral , RNA Longo não Codificante/genéticaRESUMO
BACKGROUND: Improving functional annotation of the chicken genome is a key challenge in bridging the gap between genotype and phenotype. Among all transcribed regions, long noncoding RNAs (lncRNAs) are a major component of the transcriptome and its regulation, and whole-transcriptome sequencing (RNA-Seq) has greatly improved their identification and characterization. We performed an extensive profiling of the lncRNA transcriptome in the chicken liver and adipose tissue by RNA-Seq. We focused on these two tissues because of their importance in various economical traits for which energy storage and mobilization play key roles and also because of their high cell homogeneity. To predict lncRNAs, we used a recently developed tool called FEELnc, which also classifies them with respect to their distance and strand orientation to the closest protein-coding genes. Moreover, to confidently identify the genes/transcripts expressed in each tissue (a complex task for weakly expressed molecules such as lncRNAs), we probed a particularly large number of biological replicates (16 per tissue) compared to common multi-tissue studies with a larger set of tissues but less sampling. RESULTS: We predicted 2193 lncRNA genes, among which 1670 were robustly expressed across replicates in the liver and/or adipose tissue and which were classified into 1493 intergenic and 177 intragenic lncRNAs located between and within protein-coding genes, respectively. We observed similar structural features between chickens and mammals, with strong synteny conservation but without sequence conservation. As previously reported, we confirm that lncRNAs have a lower and more tissue-specific expression than mRNAs. Finally, we showed that adjacent lncRNA-mRNA genes in divergent orientation have a higher co-expression level when separated by less than 1 kb compared to more distant divergent pairs. Among these, we highlighted for the first time a novel lncRNA candidate involved in lipid metabolism, lnc_DHCR24, which is highly correlated with the DHCR24 gene that encodes a key enzyme of cholesterol biosynthesis. CONCLUSIONS: We provide a comprehensive lncRNA repertoire in the chicken liver and adipose tissue, which shows interesting patterns of co-expression between mRNAs and lncRNAs. It contributes to improving the structural and functional annotation of the chicken genome and provides a basis for further studies on energy storage and mobilization traits in the chicken.
Assuntos
Tecido Adiposo/metabolismo , Galinhas/genética , Fígado/metabolismo , RNA Longo não Codificante/genética , Transcriptoma , Animais , Galinhas/metabolismo , Sequência Conservada , Evolução Molecular , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Genoma , Genótipo , Humanos , Metabolismo dos Lipídeos/genética , Fases de Leitura Aberta , Especificidade de Órgãos , Fenótipo , Locos de Características Quantitativas , RNA Antissenso , RNA Longo não Codificante/química , RNA Mensageiro/genéticaRESUMO
Protein expression and selection is an essential process in the modification of biological products. Expressed proteins are selected based on desired traits (phenotypes) from diverse gene libraries (genotypes), whose size may be limited due to the difficulties inherent in diverse cell preparation. In addition, not all genes can be expressed in cells, and linking genotype with phenotype further presents a great challenge in protein engineering. We present a DNA gel-based platform that demonstrates the versatility of two DNA microgel formats to address fundamental challenges of protein engineering, including high protein yield, isolation of gene sets, and protein display. We utilize microgels to show successful protein production and capture of a model protein, green fluorescent protein (GFP), which is further used to demonstrate a successful gene enrichment through fluorescence-activated cell sorting (FACS) of a mixed population of microgels containing the GFP gene. Through psoralen cross-linking of the hydrogels, we have synthesized DNA microgels capable of surviving denaturing conditions while still possessing the ability to produce protein. Lastly, we demonstrate a method of producing extremely high local gene concentrations of up to 32â¯000 gene repeats in hydrogels 1 to 2 µm in diameter. These DNA gels can serve as a novel cell-free platform for integrated protein expression and display, which can be applied toward more powerful, scalable protein engineering and cell-free synthetic biology with no physiological boundaries and limitations.
Assuntos
DNA/química , Hidrogéis/química , Engenharia de Proteínas , Proteínas Recombinantes/genética , Reagentes de Ligações Cruzadas/química , DNA/genética , Dimetilpolisiloxanos/química , Escherichia coli/genética , Ficusina/química , Proteínas de Fluorescência Verde/genética , Hidrogéis/síntese química , Plasmídeos , Biossíntese de Proteínas/genéticaRESUMO
Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.
Assuntos
Perfilação da Expressão Gênica/métodos , Genoma Humano , Transcriptoma , Biologia Computacional/métodos , Éxons , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Íntrons , Anotação de Sequência Molecular , Fases de Leitura Aberta , Isoformas de RNA , RNA Mensageiro/química , RNA Mensageiro/genética , Reprodutibilidade dos Testes , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Sensibilidade e EspecificidadeRESUMO
The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
Assuntos
Bases de Dados Genéticas , RNA Longo não Codificante/genética , Processamento Alternativo , Animais , Núcleo Celular/genética , Núcleo Celular/metabolismo , Análise por Conglomerados , Evolução Molecular , Éxons , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Histonas/metabolismo , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos/genética , Primatas/genética , Processamento Pós-Transcricional do RNA , Sítios de Splice de RNA , RNA Mensageiro/genética , Seleção Genética , Transcrição GênicaRESUMO
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica/métodos , Anotação de Sequência Molecular , Animais , Biologia Computacional/métodos , DNA Complementar/química , DNA Complementar/genética , Evolução Molecular , Éxons , Loci Gênicos , Humanos , Internet , Modelos Moleculares , Fases de Leitura Aberta , Pseudogenes , Controle de Qualidade , Sítios de Splice de RNA , RNA Longo não Codificante , Reprodutibilidade dos Testes , Regiões não TraduzidasRESUMO
CONSPECTUS: In recent decades, DNA has taken on an assortment of diverse roles, not only as the central genetic molecule in biological systems but also as a generic material for nanoscale engineering. DNA possesses many exceptional properties, including its biological function, biocompatibility, molecular recognition ability, and nanoscale controllability. Taking advantage of these unique attributes, a variety of DNA materials have been created with properties derived both from the biological functions and from the structural characteristics of DNA molecules. These novel DNA materials provide a natural bridge between nanotechnology and biotechnology, leading to far-ranging real-world applications. In this Account, we describe our work on the design and construction of DNA materials. Based on the role of DNA in the construction, we categorize DNA materials into two classes: substrate and linker. As a substrate, DNA interfaces with enzymes in biochemical reactions, making use of molecular biology's "enzymatic toolkit". For example, employing DNA as a substrate, we utilized enzymatic ligation to prepare the first bulk hydrogel made entirely of DNA. Using this DNA hydrogel as a structural scaffold, we created a protein-producing DNA hydrogel via linking plasmid DNA onto the hydrogel matrix through enzymatic ligation. Furthermore, to fully make use of the advantages of both DNA materials and polymerase chain reaction (PCR), we prepared thermostable branched DNA that could remain intact even under denaturing conditions, allowing for their use as modular primers for PCR. Moreover, via enzymatic polymerization, we have recently constructed a physical DNA hydrogel with unique internal structure and mechanical properties. As a linker, we have used DNA to interface with other functional moieties, including gold nanoparticles, clay minerals, proteins, and lipids, allowing for hybrid materials with unique properties for desired applications. For example, we recently designed a DNA-protein conjugate as a universal adapter for protein detection. We further demonstrate a diverse assortment of applications for these DNA materials including diagnostics, protein production, controlled drug release systems, the exploration of life evolution, and plasmonics. Although DNA has shown great potential as both substrate and linker in the construction of DNA materials, it is still in the initial stages of becoming a well-established and widely used material. Important challenges include the ease of design and fabrication, scaling-up, and minimizing cost. We envision that DNA materials will continue to bridge the gap between nanotechnology and biotechnology and will ultimately be employed for many real-world applications.
Assuntos
Biotecnologia/métodos , DNA/química , Nanotecnologia/métodos , Silicatos de Alumínio , Argila , Liberação Controlada de Fármacos , Enzimas/química , Hidrogéis/química , Lipídeos/química , Nanopartículas/química , Nanoestruturas/química , Reação em Cadeia da Polimerase , Engenharia de Proteínas/métodos , Proteínas/químicaRESUMO
The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.
Assuntos
Comportamento Animal , Cruzamento , Cães/genética , Variação Genética/genética , Seleção Genética , Animais , Tamanho Corporal/genética , Cães/anatomia & histologia , Orelha/anatomia & histologia , Estudo de Associação Genômica Ampla , Técnicas de Genotipagem , Haplótipos , Heterozigoto , Homozigoto , Fenótipo , Filogenia , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The multiparametric nature of nanoparticle self-assembly makes it challenging to circumvent the instabilities that lead to aggregation and achieve crystallization under extreme conditions. By using non-base-pairing DNA as a model ligand instead of the typical base-pairing design for programmability, long-range 2D DNA-gold nanoparticle crystals can be obtained at extremely high salt concentrations and in a divalent salt environment. The interparticle spacings in these 2D nanoparticle crystals can be engineered and further tuned based on an empirical model incorporating the parameters of ligand length and ionic strength.
Assuntos
DNA/química , Ouro/química , Nanopartículas Metálicas/química , Sais/química , Pareamento de Bases , Cristalização , DNA/metabolismo , Ligantes , Cloreto de Magnésio/química , Hibridização de Ácido Nucleico , Concentração Osmolar , Cloreto de Sódio/químicaRESUMO
Optical microcavities, specifically, whispering-gallery mode (WGM) microcavities, with their remarkable sensitivity to environmental changes, have been extensively employed as biosensors, enabling the detection of a wide range of biomolecules and nanoparticles. To push the limits of detection down to the most sensitive single-molecule level, plasmonic nanorods are strategically introduced to enhance the evanescent fields of WGM microcavities. This advancement of optoplasmonic WGM sensors allows for the detection of single molecules of a protein, conformational changes, and even atomic ions, marking significant contributions in single-molecule sensing. This Perspective discusses the exciting research prospects in optoplasmonic WGM sensing of single molecules, including the study of enzyme thermodynamics and kinetics, the emergence of thermo-optoplasmonic sensing, the ultrasensitive single-molecule sensing on WGM microlasers, and applications in synthetic biology.