Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Bioinform Adv ; 4(1): vbad188, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38213821

RESUMEN

Motivation: Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with common diseases. These results include a mix of causal and non-causal variants related through strong linkage disequilibrium (LD, i.e. highly correlated). Fine-mapping methods have been developed to decipher the causal from non-causal variants using GWAS results and LD information, assigning to each variant a probability of being causal. In this field, the PAINTOR program has become a standard, one of its advantages being its ability to take into account functional annotations. This approach requires many pre- and post-processing steps. Here, we developed a Nextflow pipeline called PaintorPipe that wraps all these steps and the fine-mapping itself together. PaintorPipe uses three independent sources of information: GWAS summary statistics, LD information and functional annotations, to rank the variants according to their susceptibility to be involved in the disease development. The PAINTOR framework is used to calculate the posterior probability of each variant (single nucleotide polymorphism) to be causal (a.k.a. Bayesian fine-mapping). The resulting credible sets of variants are annotated with their biological functions and visualized using CANVIS. This pipeline requires minimal input from users (a GWAS summary statistics file and a set of functional annotation files) and is designed to be modular and customizable, allowing for an easy integration of diverse functional annotations. Availability and implementation: PaintorPipe is implemented in the Nextflow pipeline specific language, can be run locally or on a slurm cluster and handles containerization using Singularity. PaintorPipe is freely available on GitHub (https://github.com/sdjebali/PaintorPipe).

2.
NAR Genom Bioinform ; 5(4): lqad089, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37850035

RESUMEN

Genome annotation plays a crucial role in providing comprehensive catalog of genes and transcripts for a particular species. As research projects generate new transcriptome data worldwide, integrating this information into existing annotations becomes essential. However, most bioinformatics pipelines are limited in their ability to effectively and consistently update annotations using new RNA-seq data. Here we introduce TAGADA, an RNA-seq pipeline for Transcripts And Genes Assembly, Deconvolution, and Analysis. Given a genomic sequence, a reference annotation and RNA-seq reads, TAGADA enhances existing gene models by generating an improved annotation. It also computes expression values for both the reference and novel annotation, identifies long non-coding transcripts (lncRNAs), and provides a comprehensive quality control report. Developed using Nextflow DSL2, TAGADA offers user-friendly functionalities and ensures reproducibility across different computing platforms through its containerized environment. In this study, we demonstrate the efficacy of TAGADA using RNA-seq data from the GENE-SWiTCH project alongside chicken and pig genome annotations as references. Results indicate that TAGADA can substantially increase the number of annotated transcripts by approximately [Formula: see text] in these species. Furthermore, we illustrate how TAGADA can integrate Illumina NovaSeq short reads with PacBio Iso-Seq long reads, showcasing its versatility. TAGADA is available at github.com/FAANG/analysis-TAGADA.

3.
Sci Data ; 10(1): 369, 2023 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-37291142

RESUMEN

Inspired by the production of reference data sets in the Genome in a Bottle project, we sequenced one Charolais heifer with different technologies: Illumina paired-end, Oxford Nanopore, Pacific Biosciences (HiFi and CLR), 10X Genomics linked-reads, and Hi-C. In order to generate haplotypic assemblies, we also sequenced both parents with short reads. From these data, we built two haplotyped trio high quality reference genomes and a consensus assembly, using up-to-date software packages. The assemblies obtained using PacBio HiFi reaches a size of 3.2 Gb, which is significantly larger than the 2.7 Gb ARS-UCD1.2 reference. The BUSCO score of the consensus assembly reaches a completeness of 95.8%, among highly conserved mammal genes. We also identified 35,866 structural variants larger than 50 base pairs. This assembly is a contribution to the bovine pangenome for the "Charolais" breed. These datasets will prove to be useful resources enabling the community to gain additional insight on sequencing technologies for applications such as SNP, indel or structural variant calling, and de novo assembly.


Asunto(s)
Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Animales , Bovinos , Femenino , Benchmarking , Genoma , Análisis de Secuencia de ADN
4.
Front Bioinform ; 3: 1092853, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36909938

RESUMEN

Differences in cells' functions arise from differential activity of regulatory elements, including enhancers. Enhancers are cis-regulatory elements that cooperate with promoters through transcription factors to activate the expression of one or several genes by getting physically close to them in the 3D space of the nucleus. There is increasing evidence that genetic variants associated with common diseases are enriched in enhancers active in cell types relevant to these diseases. Identifying the enhancers associated with genes and conversely, the sets of genes activated by each enhancer (the so-called enhancer/gene or E/G relationships) across cell types, can help understanding the genetic mechanisms underlying human diseases. There are three broad approaches for the genome-wide identification of E/G relationships in a cell type: 1) genetic link methods or eQTL, 2) functional link methods based on 1D functional data such as open chromatin, histone mark or gene expression and 3) spatial link methods based on 3D data such as HiC. Since 1) and 3) are costly, the current strategy is to develop functional link methods and to use data from 1) and 3) as reference to evaluate them. However, there is still no consensus on the best functional link method to date, and method comparison remain seldom. Here, we compared the relative performances of three recent methods for the identification of enhancer-gene links, TargetFinder, Average-Rank, and the ABC model, using the three latest benchmarks from the field: a reference that combines 3D and eQTL data, called BENGI, and two genetic screening references, called CRiFF and CRiSPRi. Overall, none of the three methods performed best on the three references. CRiFF and CRISPRi reference sets are likely more reliable, but CRiFF is not genome-wide and CRiFF and CRISPRi are mostly available on the K562 cancer cell line. The BENGI reference set is genome-wide but likely contains many false positives. This study therefore calls for new reliable and genome-wide E/G reference data rather than new functional link E/G identification methods.

5.
Front Genet ; 12: 655707, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34262593

RESUMEN

In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.

6.
Genes (Basel) ; 12(4)2021 04 09.
Artículo en Inglés | MEDLINE | ID: mdl-33918852

RESUMEN

Steroid metabolism is a fundamental process in the porcine testis to provide testosterone but also estrogens and androstenone, which are essential for the physiology of the boar. This study concerns boars at an early stage of puberty. Using a RT-qPCR approach, we showed that the transcriptional activities of several genes providing key enzymes involved in this metabolism (such as CYP11A1) are correlated. Surprisingly, HSD17B3, a key gene for testosterone production, was absent from this group. An additional weighted gene co-expression network analysis was performed on two large sets of mRNA-seq to identify co-expression modules. Of these modules, two containing either CYP11A1 or HSD17B3 were further analyzed. This comprehensive correlation meta-analysis identified a group of 85 genes with CYP11A1 as hub gene, but did not allow the characterization of a robust correlation network around HSD17B3. As the CYP11A1-group includes most of the genes involved in steroid synthesis pathways (including LHCGR encoding for the LH receptor), it may control the synthesis of most of the testicular steroids. The independent expression of HSD17B3 probably allows part of the production of testosterone to escape this control. This CYP11A1-group contained also INSL3 and AGT genes encoding a peptide hormone and an angiotensin peptide precursor, respectively.


Asunto(s)
17-Hidroxiesteroide Deshidrogenasas/metabolismo , Enzima de Desdoblamiento de la Cadena Lateral del Colesterol/metabolismo , Redes Reguladoras de Genes , Transducción de Señal , Testículo/metabolismo , Testosterona/metabolismo , 17-Hidroxiesteroide Deshidrogenasas/genética , Animales , Enzima de Desdoblamiento de la Cadena Lateral del Colesterol/genética , Masculino , Porcinos
8.
Sci Rep ; 10(1): 20457, 2020 11 24.
Artículo en Inglés | MEDLINE | ID: mdl-33235280

RESUMEN

Long non-coding RNAs (LNC) regulate numerous biological processes. In contrast to human, the identification of LNC in farm species, like chicken, is still lacunar. We propose a catalogue of 52,075 chicken genes enriched in LNC ( http://www.fragencode.org/ ), built from the Ensembl reference extended using novel LNC modelled here from 364 RNA-seq and LNC from four public databases. The Ensembl reference grew from 4,643 to 30,084 LNC, of which 59% and 41% with expression ≥ 0.5 and ≥ 1 TPM respectively. Characterization of these LNC relatively to the closest protein coding genes (PCG) revealed that 79% of LNC are in intergenic regions, as in other species. Expression analysis across 25 tissues revealed an enrichment of co-expressed LNC:PCG pairs, suggesting co-regulation and/or co-function. As expected LNC were more tissue-specific than PCG (25% vs. 10%). Similarly to human, 16% of chicken LNC hosted one or more miRNA. We highlighted a new chicken LNC, hosting miR155, conserved in human, highly expressed in immune tissues like miR155, and correlated with immunity-related PCG in both species. Among LNC:PCG pairs tissue-specific in the same tissue, we revealed an enrichment of divergent pairs with the PCG coding transcription factors, as for example LHX5, HXD3 and TBX4, in both human and chicken.


Asunto(s)
Pollos/genética , Biología Computacional/métodos , Anotación de Secuencia Molecular/métodos , ARN Largo no Codificante/genética , Animales , Atlas como Asunto , Proteínas Aviares/genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Redes Reguladoras de Genes , MicroARNs/genética , Especificidad de Órganos , Análisis de Secuencia de ARN , Distribución Tisular
9.
Genome Res ; 30(7): 1047-1059, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32759341

RESUMEN

We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.


Asunto(s)
Transcripción Genética , Línea Celular , Células Endoteliales/metabolismo , Células Epiteliales/metabolismo , Femenino , Perfilación de la Expresión Génica , Ginecomastia/genética , Ginecomastia/metabolismo , Humanos , Masculino , Mesodermo/citología , Mesodermo/metabolismo , Neoplasias/genética , Especificidad de Órganos , Análisis de Secuencia de ARN
10.
BMC Biol ; 17(1): 108, 2019 12 30.
Artículo en Inglés | MEDLINE | ID: mdl-31884969

RESUMEN

BACKGROUND: Comparative genomics studies are central in identifying the coding and non-coding elements associated with complex traits, and the functional annotation of genomes is a critical step to decipher the genotype-to-phenotype relationships in livestock animals. As part of the Functional Annotation of Animal Genomes (FAANG) action, the FR-AgENCODE project aimed to create reference functional maps of domesticated animals by profiling the landscape of transcription (RNA-seq), chromatin accessibility (ATAC-seq) and conformation (Hi-C) in species representing ruminants (cattle, goat), monogastrics (pig) and birds (chicken), using three target samples related to metabolism (liver) and immunity (CD4+ and CD8+ T cells). RESULTS: RNA-seq assays considerably extended the available catalog of annotated transcripts and identified differentially expressed genes with unknown function, including new syntenic lncRNAs. ATAC-seq highlighted an enrichment for transcription factor binding sites in differentially accessible regions of the chromatin. Comparative analyses revealed a core set of conserved regulatory regions across species. Topologically associating domains (TADs) and epigenetic A/B compartments annotated from Hi-C data were consistent with RNA-seq and ATAC-seq data. Multi-species comparisons showed that conserved TAD boundaries had stronger insulation properties than species-specific ones and that the genomic distribution of orthologous genes in A/B compartments was significantly conserved across species. CONCLUSIONS: We report the first multi-species and multi-assay genome annotation results obtained by a FAANG project. Beyond the generation of reference annotations and the confirmation of previous findings on model animals, the integrative analysis of data from multiple assays and species sheds a new light on the multi-scale selective pressure shaping genome organization from birds to mammals. Overall, these results emphasize the value of FAANG for research on domesticated animals and reinforces the importance of future meta-analyses of the reference datasets being generated by this community on different species.


Asunto(s)
Animales Domésticos/genética , Cromatina/genética , Anotación de Secuencia Molecular , Transcriptoma , Animales , Bovinos , Pollos , Cabras , Filogenia , Sus scrofa
11.
RNA Biol ; 16(9): 1190-1204, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31120323

RESUMEN

To investigate the dynamics of circRNA expression in pig testes, we designed specific strategies to individually study circRNA production from intron lariats and circRNAs originating from back-splicing of two exons. By applying these methods on seven Total-RNA-seq datasets sampled during the testicular puberty, we detected 126 introns in 114 genes able to produce circRNAs and 5,236 exonic circRNAs produced by 2,516 genes. Comparing our RNA-seq datasets to datasets from the literature (embryonic cortex and postnatal muscle stages) revealed highly abundant intronic and exonic circRNAs in one sample each in pubertal testis and embryonic cortex, respectively. This abundance was due to higher production of circRNA by the same genes in comparison to other testis samples, rather than to the recruitment of new genes. No global relationship between circRNA and mRNA production was found. We propose ExoCirc-9244 (SMARCA5) as a marker of a particular stage in testis, which is characterized by a very low plasma estradiol level and a high abundance of circRNA in testis. We hypothesize that the abundance of testicular circRNA is associated with an abrupt switch of the cellular process to overcome a particular challenge that may have arisen in the early stages of steroid production. We also hypothesize that, in certain circumstances, isoforms and circular transcripts from different genes share functions and that a global regulation of circRNA production is established. Our data indicate that this massive production of circRNAs is much more related to the structure of the genes generating circRNAs than to their function. Abbreviations: PE: Paired Ends; CR: chimeric Read; SR: Split Read; circRNA: circular RNA; NC: non conventional; ExoCirc-RNA: exonic circular RNA; IntroLCirc-: name of a porcine intronic lariat circRNA; ExoCirc-: name of a porcine exonic circRNA; IntronCircle-: name of a porcine intron circle; sisRNA: stable intronic sequence RNA; P: porcine breed Pietrain; LW: porcine breed Large White; RT: reverse transcription/reverse transcriptase; Total-RNA-seq: RNA-seq obtained from total RNA after ribosomal depletion; mRNA-seq: RNA-seq of poly(A) transcripts; TPM: transcripts per million; CR-PM: chimeric reads per million; RBP: RNA binding protein; miRNA: micro RNA; E2: estradiol; DHT: dihydrotestesterone.


Asunto(s)
Regulación de la Expresión Génica , ARN Circular/genética , Porcinos/genética , Transcriptoma/genética , Animales , Embrión de Mamíferos/metabolismo , Exones/genética , Intrones/genética , Masculino , Músculos/metabolismo , ARN Circular/metabolismo , Reproducibilidad de los Resultados , Porcinos/embriología , Testículo/metabolismo
12.
Nature ; 543(7644): 199-204, 2017 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-28241135

RESUMEN

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.


Asunto(s)
Bases de Datos Genéticas , ARN Largo no Codificante/química , ARN Largo no Codificante/genética , Transcriptoma/genética , Células Cultivadas , Secuencia Conservada/genética , Conjuntos de Datos como Asunto , Elementos de Facilitación Genéticos/genética , Epigénesis Genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Internet , Anotación de Secuencia Molecular , Especificidad de Órganos/genética , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas/genética , Sitios de Carácter Cuantitativo/genética , Estabilidad del ARN , ARN Mensajero/genética
13.
BMC Genomics ; 18(1): 7, 2017 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-28049418

RESUMEN

BACKGROUND: Chimeric transcripts are commonly defined as transcripts linking two or more different genes in the genome, and can be explained by various biological mechanisms such as genomic rearrangement, read-through or trans-splicing, but also by technical or biological artefacts. Several studies have shown their importance in cancer, cell pluripotency and motility. Many programs have recently been developed to identify chimeras from Illumina RNA-seq data (mostly fusion genes in cancer). However outputs of different programs on the same dataset can be widely inconsistent, and tend to include many false positives. Other issues relate to simulated datasets restricted to fusion genes, real datasets with limited numbers of validated cases, result inconsistencies between simulated and real datasets, and gene rather than junction level assessment. RESULTS: Here we present ChimPipe, a modular and easy-to-use method to reliably identify fusion genes and transcription-induced chimeras from paired-end Illumina RNA-seq data. We have also produced realistic simulated datasets for three different read lengths, and enhanced two gold-standard cancer datasets by associating exact junction points to validated gene fusions. Benchmarking ChimPipe together with four other state-of-the-art tools on this data showed ChimPipe to be the top program at identifying exact junction coordinates for both kinds of datasets, and the one showing the best trade-off between sensitivity and precision. Applied to 106 ENCODE human RNA-seq datasets, ChimPipe identified 137 high confidence chimeras connecting the protein coding sequence of their parent genes. In subsequent experiments, three out of four predicted chimeras, two of which recurrently expressed in a large majority of the samples, could be validated. Cloning and sequencing of the three cases revealed several new chimeric transcript structures, 3 of which with the potential to encode a chimeric protein for which we hypothesized a new role. Applying ChimPipe to human and mouse ENCODE RNA-seq data led to the identification of 131 recurrent chimeras common to both species, and therefore potentially conserved. CONCLUSIONS: ChimPipe combines discordant paired-end reads and split-reads to detect any kind of chimeras, including those originating from polymerase read-through, and shows an excellent trade-off between sensitivity and precision. The chimeras found by ChimPipe can be validated in-vitro with high accuracy.


Asunto(s)
Proteínas de Fusión Oncogénica , Recombinación Genética , Programas Informáticos , Transcripción Genética , Animales , Biología Computacional/métodos , Simulación por Computador , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN
14.
Genet Sel Evol ; 49(1): 6, 2017 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-28073357

RESUMEN

BACKGROUND: Improving functional annotation of the chicken genome is a key challenge in bridging the gap between genotype and phenotype. Among all transcribed regions, long noncoding RNAs (lncRNAs) are a major component of the transcriptome and its regulation, and whole-transcriptome sequencing (RNA-Seq) has greatly improved their identification and characterization. We performed an extensive profiling of the lncRNA transcriptome in the chicken liver and adipose tissue by RNA-Seq. We focused on these two tissues because of their importance in various economical traits for which energy storage and mobilization play key roles and also because of their high cell homogeneity. To predict lncRNAs, we used a recently developed tool called FEELnc, which also classifies them with respect to their distance and strand orientation to the closest protein-coding genes. Moreover, to confidently identify the genes/transcripts expressed in each tissue (a complex task for weakly expressed molecules such as lncRNAs), we probed a particularly large number of biological replicates (16 per tissue) compared to common multi-tissue studies with a larger set of tissues but less sampling. RESULTS: We predicted 2193 lncRNA genes, among which 1670 were robustly expressed across replicates in the liver and/or adipose tissue and which were classified into 1493 intergenic and 177 intragenic lncRNAs located between and within protein-coding genes, respectively. We observed similar structural features between chickens and mammals, with strong synteny conservation but without sequence conservation. As previously reported, we confirm that lncRNAs have a lower and more tissue-specific expression than mRNAs. Finally, we showed that adjacent lncRNA-mRNA genes in divergent orientation have a higher co-expression level when separated by less than 1 kb compared to more distant divergent pairs. Among these, we highlighted for the first time a novel lncRNA candidate involved in lipid metabolism, lnc_DHCR24, which is highly correlated with the DHCR24 gene that encodes a key enzyme of cholesterol biosynthesis. CONCLUSIONS: We provide a comprehensive lncRNA repertoire in the chicken liver and adipose tissue, which shows interesting patterns of co-expression between mRNAs and lncRNAs. It contributes to improving the structural and functional annotation of the chicken genome and provides a basis for further studies on energy storage and mobilization traits in the chicken.


Asunto(s)
Tejido Adiposo/metabolismo , Pollos/genética , Hígado/metabolismo , ARN Largo no Codificante/genética , Transcriptoma , Animales , Pollos/metabolismo , Secuencia Conservada , Evolución Molecular , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genoma , Genotipo , Humanos , Metabolismo de los Lípidos/genética , Sistemas de Lectura Abierta , Especificidad de Órganos , Fenotipo , Sitios de Carácter Cuantitativo , ARN sin Sentido , ARN Largo no Codificante/química , ARN Mensajero/genética
16.
Methods Mol Biol ; 1468: 201-19, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27662878

RESUMEN

The development of High Throughput Sequencing (HTS) for RNA profiling (RNA-seq) has shed light on the diversity of transcriptomes. While RNA-seq is becoming a de facto standard for monitoring the population of expressed transcripts in a given condition at a specific time, processing the huge amount of data it generates requires dedicated bioinformatics programs. Here, we describe a standard bioinformatics protocol using state-of-the-art tools, the STAR mapper to align reads onto a reference genome, Cufflinks to reconstruct the transcriptome, and RSEM to quantify expression levels of genes and transcripts. We present the workflow using human transcriptome sequencing data from two biological replicates of the K562 cell line produced as part of the ENCODE3 project.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células K562 , Análisis de Secuencia de ARN , Flujo de Trabajo
18.
Genome Biol ; 17(1): 151, 2016 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-27391956

RESUMEN

BACKGROUND: A comparison of transcriptional profiles derived from different tissues in a given species or among different species assumes that commonalities reflect evolutionarily conserved programs and that differences reflect species or tissue responses to environmental conditions or developmental program staging. Apparently conflicting results have been published regarding whether organ-specific transcriptional patterns dominate over species-specific patterns, or vice versa, making it unclear to what extent the biology of a given organism can be extrapolated to another. These studies have in common that they treat the transcriptomes monolithically, implicitly ignoring that each gene is likely to have a specific pattern of transcriptional variation across organs and species. RESULTS: We use linear models to quantify this pattern. We find a continuum in the spectrum of expression variation: the expression of some genes varies considerably across species and little across organs, and simply reflects evolutionary distance. At the other extreme are genes whose expression varies considerably across organs and little across species; these genes are much more likely to be associated with diseases than are genes whose expression varies predominantly across species. CONCLUSIONS: Whether transcriptomes, when considered globally, cluster preferentially according to one component or the other may not be a property of the transcriptomes, but rather a consequence of the dominant behavior of a subset of genes. Therefore, the values of the components of the variance of expression for each gene could become a useful resource when planning, interpreting, and extrapolating experimental data from mouse to humans.


Asunto(s)
Evolución Molecular , Regulación del Desarrollo de la Expresión Génica/genética , Especificidad de Órganos/genética , Transcriptoma/genética , Animales , Perfilación de la Expresión Génica , Humanos , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ARN , Especificidad de la Especie
20.
Genome Biol ; 17: 74, 2016 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-27107712

RESUMEN

Obtaining RNA-seq measurements involves a complex data analytical process with a large number of competing algorithms as options. There is much debate about which of these methods provides the best approach. Unfortunately, it is currently difficult to evaluate their performance due in part to a lack of sensitive assessment metrics. We present a series of statistical summaries and plots to evaluate the performance in terms of specificity and sensitivity, available as a R/Bioconductor package ( http://bioconductor.org/packages/rnaseqcomp ). Using two independent datasets, we assessed seven competing pipelines. Performance was generally poor, with two methods clearly underperforming and RSEM slightly outperforming the rest.


Asunto(s)
Algoritmos , Análisis de Secuencia de ARN/métodos , Animales , Humanos , Valores de Referencia , Sensibilidad y Especificidad , Análisis de Secuencia de ARN/normas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...