RESUMO
Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/FEELnc.
Assuntos
Genoma , Anotação de Sequência Molecular/métodos , RNA Longo não Codificante/genética , Software , Transcriptoma , Animais , Benchmarking , Árvores de Decisões , Cães , Regulação da Expressão Gênica , Humanos , Camundongos , Anotação de Sequência Molecular/estatística & dados numéricos , Fases de Leitura Aberta , RNA Longo não Codificante/classificação , RNA Longo não Codificante/metabolismo , RNA Mensageiro/classificação , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNARESUMO
Mucosal melanomas (MM) are rare aggressive cancers in humans, and one of the most common forms of oral cancers in dogs. Similar biological and histological features are shared between MM in both species, making dogs a powerful model for comparative oncology studies of melanomas. Although exome sequencing recently identified recurrent coding mutations in canine MM, little is known about changes in non-coding gene expression, and more particularly, in canine long non-coding RNAs (lncRNAs), which are commonly dysregulated in human cancers. Here, we sampled a large cohort (n = 52) of canine normal/tumor oral MM from three predisposed breeds (poodles, Labrador retrievers, and golden retrievers), and used deep transcriptome sequencing to identify more than 400 differentially expressed (DE) lncRNAs. We further prioritized candidate lncRNAs by comparative genomic analysis to pinpoint 26 dog-human conserved DE lncRNAs, including SOX21-AS, ZEB2-AS, and CASC15 lncRNAs. Using unsupervised co-expression network analysis with coding genes, we inferred the potential functions of the DE lncRNAs, suggesting associations with cancer-related genes, cell cycle, and carbohydrate metabolism Gene Ontology (GO) terms. Finally, we exploited our multi-breed design to identify DE lncRNAs within breeds. This study provides a unique transcriptomic resource for studying oral melanoma in dogs, and highlights lncRNAs that may potentially be diagnostic or therapeutic targets for human and veterinary medicine.
Assuntos
Doenças do Cão/genética , Melanoma/genética , Neoplasias Bucais/genética , RNA Longo não Codificante/genética , Animais , Cruzamento , Doenças do Cão/patologia , Cães , Perfilação da Expressão Gênica , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Melanoma/patologia , Neoplasias Bucais/patologia , Transcriptoma/genéticaRESUMO
Long non-coding RNAs (lncRNAs) are a family of heterogeneous RNAs that play major roles in multiple biological processes. We recently identified an extended repertoire of more than 10,000 lncRNAs of the domestic dog however, predicting their biological functionality remains challenging. In this study, we have characterised the expression profiles of 10,444 canine lncRNAs in 26 distinct tissue types, representing various anatomical systems. We showed that lncRNA expressions are mainly clustered by tissue type and we highlighted that 44% of canine lncRNAs are expressed in a tissue-specific manner. We further demonstrated that tissue-specificity correlates with specific families of canine transposable elements. In addition, we identified more than 900 conserved dog-human lncRNAs for which we show their overall reproducible expression patterns between dog and human through comparative transcriptomics. Finally, co-expression analyses of lncRNA and neighbouring protein-coding genes identified more than 3,400 canine lncRNAs, suggesting that functional roles of these lncRNAs act as regulatory elements. Altogether, this genomic and transcriptomic integrative study of lncRNAs constitutes a major resource to investigate genotype to phenotype relationships and biomedical research in the dog species.
Assuntos
Bases de Dados de Ácidos Nucleicos , Regulação da Expressão Gênica/fisiologia , RNA Longo não Codificante/biossíntese , Transcriptoma , Animais , Cães , Humanos , Especificidade de Órgãos , RNA Longo não Codificante/genéticaRESUMO
Genome-wide association studies (GWAS) are widely used to identify loci associated with phenotypic traits in the domestic dog that has emerged as a model for Mendelian and complex traits. However, a disadvantage of GWAS is that it always requires subsequent fine-mapping or sequencing to pinpoint causal mutations. Here, we performed whole exome sequencing (WES) and canine high-density (cHD) SNP genotyping of 28 dogs from 3 breeds to compare the SNP and linkage disequilibrium characteristics together with the power and mapping precision of exome-guided GWAS (EG-GWAS) versus cHD-based GWAS. Using simulated phenotypes, we showed that EG-GWAS has a higher power than cHD to detect associations within target regions and less power outside target regions, with power being influenced further by sample size and SNP density. We analyzed two real phenotypes (hair length and furnishing), that are fixed in certain breeds to characterize mapping precision of the known causal mutations. EG-GWAS identified the associated exonic and 3'UTR variants within the FGF5 and RSPO2 genes, respectively, with only a few samples per breed. In conclusion, we demonstrated that EG-GWAS can identify loci associated with Mendelian phenotypes both within and across breeds.