Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 114
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genomics ; 112(3): 2418-2425, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-31981701

RESUMO

Alternative splicing contributes to the diversity of gene products by producing multiple transcript variants from one gene. Previous studies have revealed highly variable splicing patterns in single cells, but there is still a controversy in the understanding of the simultaneous expression of multiple transcript variants. Here we show that the dominance of a single transcript variant is a common phenomenon in single cells. We analyzed several single-cell RNA sequencing datasets and observed consistent results. Our results demonstrate that single cells tend to express one major transcript variant of a gene, and the diversity of transcript variants in cell populations mainly results from the heterogeneity of splicing pattern in single cells.

3.
Artigo em Inglês | MEDLINE | ID: mdl-31647443

RESUMO

Machine learning (ML) and Natural Language Processing (NLP) have achieved remarkable success in many fields and have brought new opportunities and high expectation in the analyses of medical data, of which the most common type is the massive free-text electronic medical records (EMR). However, the free EMR texts are lacking consistent standards, rich of private information, and limited in availability. Also, it is often hard to have a balanced number of samples for the types of diseases under study. These problems hinder the development of ML and NLP methods for EMR data analysis. To tackle these problems, we developed a model called Medical Text Generative Adversarial Network or mtGAN, to generate synthetic EMR text. It is based on the GAN framework and is trained by the REINFORCE algorithm. It takes disease features as inputs and generates synthetic texts as EMRs for the corresponding diseases. We evaluate the model from micro-level, macro-level and application-level on a Chinese EMR text dataset. The results show that the method has a good capacity to fit real data and can generate realistic and diverse EMR samples. This provides a novel way to avoid potential leakage of patient privacy while still supply sufficient well-controlled cohort data for developing downstream ML and NLP methods.

4.
Oncologist ; 24(9): 1159-1165, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-30996009

RESUMO

BACKGROUND: Computed tomography (CT) is essential for pulmonary nodule detection in diagnosing lung cancer. As deep learning algorithms have recently been regarded as a promising technique in medical fields, we attempt to integrate a well-trained deep learning algorithm to detect and classify pulmonary nodules derived from clinical CT images. MATERIALS AND METHODS: Open-source data sets and multicenter data sets have been used in this study. A three-dimensional convolutional neural network (CNN) was designed to detect pulmonary nodules and classify them into malignant or benign diseases based on pathologically and laboratory proven results. RESULTS: The sensitivity and specificity of this well-trained model were found to be 84.4% (95% confidence interval [CI], 80.5%-88.3%) and 83.0% (95% CI, 79.5%-86.5%), respectively. Subgroup analysis of smaller nodules (<10 mm) have demonstrated remarkable sensitivity and specificity, similar to that of larger nodules (10-30 mm). Additional model validation was implemented by comparing manual assessments done by different ranks of doctors with those performed by three-dimensional CNN. The results show that the performance of the CNN model was superior to manual assessment. CONCLUSION: Under the companion diagnostics, the three-dimensional CNN with a deep learning algorithm may assist radiologists in the future by providing accurate and timely information for diagnosing pulmonary nodules in regular clinical practices. IMPLICATIONS FOR PRACTICE: The three-dimensional convolutional neural network described in this article demonstrated both high sensitivity and high specificity in classifying pulmonary nodules regardless of diameters as well as superiority compared with manual assessment. Although it still warrants further improvement and validation in larger screening cohorts, its clinical application could definitely facilitate and assist doctors in clinical practice.

5.
BMC Genomics ; 20(Suppl 2): 183, 2019 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-30967110

RESUMO

BACKGROUND: Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage. RESULTS: As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (KRI) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses. CONCLUSIONS: We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it.


Assuntos
Metagenoma , Metagenômica/métodos , Microbiota/genética , Software , Algoritmos , Conjuntos de Dados como Assunto , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
6.
Bioinformatics ; 35(22): 4596-4606, 2019 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-30993316

RESUMO

MOTIVATION: Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions. RESULTS: Using the statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate clustered regularly interspaced short palindromic repeats regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads. AVAILABILITY AND IMPLEMENTATION: The codes are available at https://github.com/XuegongLab/D2R_codes under GPL 3.0 license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
Sci Rep ; 9(1): 2877, 2019 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-30814546

RESUMO

Super-enhancers (SEs) are clusters of transcriptional enhancers which control the expression of cell identity and disease-associated genes. Current studies demonstrated the role of multiple factors in SE formation; however, a systematic analysis to assess the relative predictive importance of chromatin and sequence features of SEs and their constituents is lacking. In addition, a predictive model that integrates various types of data to predict SEs has not been established. Here, we integrated diverse types of genomic and epigenomic datasets to identify key signatures of SEs and investigated their predictive importance. Through integrative modeling, we found Cdk8, Cdk9, and Smad3 as new features of SEs, which can define known and new SEs in mouse embryonic stem cells and pro-B cells. We compared six state-of-the-art machine learning models to predict SEs and showed that non-parametric ensemble models performed better as compared to parametric. We validated these models using cross-validation and also independent datasets in four human cell-types. Taken together, our systematic analysis and ranking of features can be used as a platform to define and understand the biology of SEs in other cell-types.

8.
Protein Cell ; 10(7): 496-509, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30478535

RESUMO

The development of gastritis is associated with an increased risk of gastric cancer. Current invasive gastritis diagnostic methods are not suitable for monitoring progress. In this work based on 78 gastritis patients and 50 healthy individuals, we observed that the variation of tongue-coating microbiota was associated with the occurrence and development of gastritis. Twenty-one microbial species were identified for differentiating tongue-coating microbiomes of gastritis and healthy individuals. Pathways such as microbial metabolism in diverse environments, biosynthesis of antibiotics and bacterial chemotaxis were up-regulated in gastritis patients. The abundance of Campylobacter concisus was found associated with the gastric precancerous cascade. Furthermore, Campylobacter concisus could be detected in tongue coating and gastric fluid in a validation cohort containing 38 gastritis patients. These observations provided biological evidence of tongue diagnosis in traditional Chinese medicine, and indicated that tongue-coating microbiome could be a potential non-invasive biomarker, which might be suitable for long-term monitoring of gastritis.

9.
BMC Genomics ; 19(Suppl 6): 564, 2018 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-30367578

RESUMO

BACKGROUND: Alternative splicing is a ubiquitous post-transcriptional regulation mechanism in most eukaryotic genes. Aberrant splicing isoforms and abnormal isoform ratios can contribute to cancer development. Kinase genes are key regulators of multiple cellular processes. Many kinases are found to be oncogenic and have been intensively investigated in the study of cancer and drugs. RNA-Seq provides a powerful technology for genome-wide study of alternative splicing in cancer besides the conventional gene expression profiling. But this potential has not been fully demonstrated yet. METHODS: We characterized the transcriptome profile of prostate cancer using RNA-Seq data from viewpoints of both differential expression and differential splicing, with an emphasis on kinase genes and their splicing variations. We built a pipeline to conduct differential expression and differential splicing analysis, followed by functional enrichment analysis. We performed kinase domain analysis to identify the functionally important candidate kinase gene in prostate cancer, and calculated the expression levels of isoforms to explore the function of isoform switching of kinase genes in prostate cancer. RESULTS: We identified distinct gene groups from differential expression and splicing analyses, which suggested that alternative splicing adds another level to gene expression regulation. Enriched GO terms of differentially expressed and spliced kinase genes were found to play different roles in regulation of cellular metabolism. Function analysis on differentially spliced kinase genes showed that differentially spliced exons of these genes are significantly enriched in protein kinase domains. Among them, we found that gene CDK5 has isoform switching between prostate cancer and benign tissues, which may affect cancer development by changing androgen receptor (AR) phosphorylation. The observation was validated in another RNA-Seq dataset of prostate cancer cell lines. CONCLUSIONS: Our work characterized the expression and splicing profiles of kinase genes in prostate cancer and proposed a hypothetical model on isoform switching of CDK5 and AR phosphorylation in prostate cancer. These findings bring new understanding to the role of alternatively spliced kinases in prostate cancer and also demonstrate the use of RNA-Seq data in studying alternative splicing in cancer.


Assuntos
Processamento Alternativo , Neoplasias da Próstata/genética , Proteínas Quinases/genética , Domínio Catalítico , Quinase 5 Dependente de Ciclina/metabolismo , Éxons , Perfilação da Expressão Gênica , Humanos , Isoenzimas/genética , Isoenzimas/metabolismo , Masculino , Neoplasias da Próstata/enzimologia , Neoplasias da Próstata/metabolismo , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Receptores Androgênicos/metabolismo , Análise de Sequência de RNA
10.
Epigenetics ; 13(9): 910-922, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30169995

RESUMO

Super-enhancers and stretch enhancers represent classes of transcriptional enhancers that have been shown to control the expression of cell identity genes and carry disease- and trait-associated variants. Specifically, super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks, while stretch enhancers are large chromatin-defined regulatory regions of at least 3,000 base pairs. Several studies have characterized these regulatory regions in numerous cell types and tissues to decipher their functional importance. However, the differences and similarities between these regulatory regions have not been fully assessed. We integrated genomic, epigenomic, and transcriptomic data from ten human cell types to perform a comparative analysis of super and stretch enhancers with respect to their chromatin profiles, cell type-specificity, and ability to control gene expression. We found that stretch enhancers are more abundant, more distal to transcription start sites, cover twice as much the genome, and are significantly less conserved than super-enhancers. In contrast, super-enhancers are significantly more enriched for active chromatin marks and cohesin complex, and more transcriptionally active than stretch enhancers. Importantly, a vast majority of super-enhancers (85%) overlap with only a small subset of stretch enhancers (13%), which are enriched for cell type-specific biological functions, and control cell identity genes. These results suggest that super-enhancers are transcriptionally more active and cell type-specific than stretch enhancers, and importantly, most of the stretch enhancers that are distinct from super-enhancers do not show an association with cell identity genes, are less active, and more likely to be poised enhancers.


Assuntos
Elementos Facilitadores Genéticos , Ativação Transcricional , Cromatina/química , Cromatina/metabolismo , Sequência Conservada , Células Hep G2 , Código das Histonas , Células Endoteliais da Veia Umbilical Humana/metabolismo , Humanos , Especificidade de Órgãos , Sítio de Iniciação de Transcrição
11.
Emerg Microbes Infect ; 7(1): 149, 2018 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-30120231

RESUMO

The Lon protease selectively degrades abnormal proteins or certain normal proteins in response to environmental and cellular conditions in many prokaryotic and eukaryotic organisms. However, the mechanism(s) behind the substrate selection of normal proteins remains largely unknown. In this study, we identified 10 new substrates of F. tularensis Lon from a total of 21 candidate substrates identified in our previous work, the largest number of novel Lon substrates from a single study. Cross-species degradation of these and other known Lon substrates revealed that human Lon is unable to degrade many bacterial Lon substrates, suggestive of a "organism-adapted" substrate selection mechanism for the natural Lon variants. However, individually replacing the N, A, and P domains of human Lon with the counterparts of bacterial Lon did not enable the human protease to degrade the same bacterial Lon substrates. This result showed that the "organism-adapted" substrate selection depends on multiple domains of the Lon proteases. Further in vitro proteolysis and mass spectrometry analysis revealed a similar substrate cleavage pattern between the bacterial and human Lon variants, which was exemplified by predominant representation of leucine, alanine, and other hydrophobic amino acids at the P(-1) site within the substrates. These observations suggest that the Lon proteases select their substrates at least in part by fine structural matching with the proteins in the same organisms.


Assuntos
Proteases Dependentes de ATP/química , Proteínas de Bactérias/química , Francisella tularensis/enzimologia , Proteínas Mitocondriais/química , Protease La/química , Proteases Dependentes de ATP/genética , Proteases Dependentes de ATP/metabolismo , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Francisella tularensis/química , Francisella tularensis/genética , Humanos , Proteínas Mitocondriais/genética , Proteínas Mitocondriais/metabolismo , Dados de Sequência Molecular , Protease La/genética , Protease La/metabolismo , Domínios Proteicos , Alinhamento de Sequência , Especificidade por Substrato
12.
Genome Biol ; 19(1): 93, 2018 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-30016986

RESUMO

GIVE is a framework and library for creating portable and personalized genome browsers. It makes visualizing genomic data as easy as building a laboratory homepage.


Assuntos
Algoritmos , Genoma Humano , Disseminação de Informação , Aplicativos Móveis , Interface Usuário-Computador , Biologia Computacional , Gráficos por Computador , Bases de Dados Genéticas , Biblioteca Gênica , Humanos , Internet
13.
Nat Commun ; 9(1): 2189, 2018 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-29875359

RESUMO

Alternative splicing (AS) is one crucial step of gene expression that must be tightly regulated during neurodevelopment. However, the precise timing of developmental splicing switches and the underlying regulatory mechanisms are poorly understood. Here we systematically analyze the temporal regulation of AS in a large number of transcriptome profiles of developing mouse cortices, in vivo purified neuronal subtypes, and neurons differentiated in vitro. Our analysis reveals early-switch and late-switch exons in genes with distinct functions, and these switches accurately define neuronal maturation stages. Integrative modeling suggests that these switches are under direct and combinatorial regulation by distinct sets of neuronal RNA-binding proteins including Nova, Rbfox, Mbnl, and Ptbp. Surprisingly, various neuronal subtypes in the sensory systems lack Nova and/or Rbfox expression. These neurons retain the "immature" splicing program in early-switch exons, affecting numerous synaptic genes. These results provide new insights into the organization and regulation of the neurodevelopmental transcriptome.


Assuntos
Processamento Alternativo , Sistema Nervoso Central/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Neurogênese/genética , Animais , Diferenciação Celular/genética , Sistema Nervoso Central/embriologia , Sistema Nervoso Central/crescimento & desenvolvimento , Camundongos Knockout , Camundongos Transgênicos , Modelos Genéticos , Modelos Neurológicos , Neurônios/citologia , Neurônios/metabolismo , Proteínas de Ligação a RNA/genética , Fatores de Tempo
14.
Oncol Lett ; 15(5): 7864-7870, 2018 May.
Artigo em Inglês | MEDLINE | ID: mdl-29731905

RESUMO

WD repeat domain 5 (WDR5) serves an important role in various biological functions through the epigenetic regulation of gene transcription. Aberrant expression of WDR5 has been observed in various types of human cancer, including prostate cancer, breast cancer and leukemia. However, the role of WDR5 expression and its clinical implications in hepatocellular carcinoma (HCC) remain largely unknown. The present study investigated the WDR5 expression pattern in HCC. It was demonstrated that the mRNA and protein levels of WDR5 were upregulated in HCC cancer tissues compared with normal adjacent tissues using reverse transcription-quantitative polymerase chain reaction and western blotting. Furthermore, the elevated WDR5 protein level was significantly associated with the histological grade (P=0.038), tumor size (P=0.023), tumor-node-metastasis stage (P=0.035) and reduced long-term survival time. Additionally, it was demonstrated through the shRNA-mediated knockdown of WDR5 in HCC cells in vitro that WDR5 expression promotes cell proliferation using an MTT assay. Taken together, the results suggested that WDR5 overexpression may have an oncogenic effect in HCC, and may be a promising biomarker for the diagnosis and prognosis of HCC.

15.
Bioinformatics ; 34(18): 3223-3224, 2018 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-29688277

RESUMO

Summary: The excessive amount of zeros in single-cell RNA-seq (scRNA-seq) data includes 'real' zeros due to the on-off nature of gene transcription in single cells and 'dropout' zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect three types of DE genes in scRNA-seq data with higher accuracy. Availability and implementation: The R package DEsingle is freely available at Bioconductor (https://bioconductor.org/packages/DEsingle). Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Humanos , Modelos Estatísticos
16.
Nucleic Acids Res ; 46(8): e45, 2018 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-29546410

RESUMO

Alternative first exons diversify the transcriptomes of eukaryotes by producing variants of the 5' Untranslated Regions (5'UTRs) and N-terminal coding sequences. Accurate transcriptome-wide detection of alternative first exons typically requires specialized experimental approaches that are designed to identify the 5' ends of transcripts. We developed a computational pipeline SEASTAR that identifies first exons from RNA-seq data alone then quantifies and compares alternative first exon usage across multiple biological conditions. The exons inferred by SEASTAR coincide with transcription start sites identified directly by CAGE experiments and bear epigenetic hallmarks of active promoters. To determine if differential usage of alternative first exons can yield insights into the mechanism controlling gene expression, we applied SEASTAR to an RNA-seq dataset that tracked the reprogramming of mouse fibroblasts into induced pluripotent stem cells. We observed dynamic temporal changes in the usage of alternative first exons, along with correlated changes in transcription factor expression. Using a combined sequence motif and gene set enrichment analysis we identified N-Myc as a regulator of alternative first exon usage in the pluripotent state. Our results demonstrate that SEASTAR can leverage the available RNA-seq data to gain insights into the control of gene expression and alternative transcript variation in eukaryotic transcriptomes.


Assuntos
Processamento Alternativo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Software , Sítio de Iniciação de Transcrição , Regiões 5' não Traduzidas , Animais , Linhagem Celular , Reprogramação Celular/genética , Éxons , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Células K562 , Modelos Logísticos , Camundongos , Regiões Promotoras Genéticas , RNA Polimerase II/genética , Análise de Sequência de RNA/estatística & dados numéricos , Transcriptoma
17.
Nucleic Acids Res ; 45(19): e166, 2017 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-28977434

RESUMO

Single cell RNA-seq (scRNA-seq) techniques can reveal valuable insights of cell-to-cell heterogeneities. Projection of high-dimensional data into a low-dimensional subspace is a powerful strategy in general for mining such big data. However, scRNA-seq suffers from higher noise and lower coverage than traditional bulk RNA-seq, hence bringing in new computational difficulties. One major challenge is how to deal with the frequent drop-out events. The events, usually caused by the stochastic burst effect in gene transcription and the technical failure of RNA transcript capture, often render traditional dimension reduction methods work inefficiently. To overcome this problem, we have developed a novel Single Cell Representation Learning (SCRL) method based on network embedding. This method can efficiently implement data-driven non-linear projection and incorporate prior biological knowledge (such as pathway information) to learn more meaningful low-dimensional representations for both cells and genes. Benchmark results show that SCRL outperforms other dimensional reduction methods on several recent scRNA-seq datasets.


Assuntos
Algoritmos , Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Feminino , Perfilação da Expressão Gênica/métodos , Células Germinativas/metabolismo , Humanos , Masculino , Reprodutibilidade dos Testes
18.
Lung Cancer ; 109: 21-27, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28577945

RESUMO

OBJECTIVES: Lung adenocarcinoma (LUAD) is a common subtype of non-small cell lung cancer prevalent in Asia. There is a dearth of understanding regarding the transcriptome landscape of LUAD without primary known driver mutations. In this study, LUAD samples without well-known driver mutations occurring in EGFR, KRAS, ALK, ROS1 or RET (quintuple-negative) were used for transcriptome study with a focus on long noncoding RNAs (lncRNAs), alternative splicing and gene fusions. MATERIALS AND METHODS: 24 pairs of LUAD and adjacent normal samples and 13 tumor-only samples derived from 37 quintuple-negative patients were used. Differentially expressed lncRNA transcripts were detected by paired t-test and were validated by qPCR. Functions of lncRNAs were predicted by co-expressed mRNAs. Aberrant splicing events in LUAD were identified using MISO. In addition, gene fusions were screened by SOAPfuse. RESULTS AND CONCLUSION: In total, 90 and 153 up- or down-regulated lncRNA transcripts were detected in LUAD samples in comparison with the adjacent normal samples. The most significantly differentially expressed lncRNA transcript was ENST00000598996.1 (FENDRR) down-regulated in LUAD. By lncRNA-mRNA co-expression analysis, functions of 14 lncRNAs were predicted. The predicted functions included vasculature development, immune response, cell cycle and respiratory gaseous exchange. Furthermore, six co-expressed pairs of lncRNAs and their nearby protein coding genes were identified as associated with lung development. This study also identified two highly recurrent (22 in 24) differential exon skipping events occurring in MYH14 and ESYT2 with exon including isoforms of both genes up-regulated in isoform percentage in LUAD samples. On the other hand, two out of 24 LUAD samples possessed the driver mutation exon 14 skipping of MET. The transcriptional alterations of LUAD samples without well-known driver mutations identified in the study can be used as references for future research. The translational values of these transcriptional changes are also worthy of further investigation.


Assuntos
Adenocarcinoma/genética , Neoplasias Pulmonares/genética , Pulmão/fisiologia , RNA Longo não Codificante/genética , Processamento Alternativo , Ásia , Ciclo Celular/genética , Éxons/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Imunidade/genética , Mutação/genética , Cadeias Pesadas de Miosina/genética , Miosina Tipo II/genética , Neovascularização Fisiológica/genética , Análise de Sequência de RNA , Sinaptotagminas/genética , Transcriptoma
19.
PLoS One ; 12(5): e0178320, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28542625

RESUMO

Alternative splicing is an ubiquitous phenomenon in most human genes and has important functions. The switch-like exon is the type of exon that has a high level of usage in some tissues, but has a low level of usage in the other tissues. They usually undergo strong tissue-specific regulations. There is still a lack a systematic method to identify switch-like exons from multiple RNA-seq samples. We proposed a novel method called iterative Tertile Absolute Deviation around the mode (iTAD) to profile the distribution of exon relative usages among multiple samples and to identify switch-like exons and other types of exons using a robust statistic estimator. We validated the method with simulation data, and applied it on RNA-seq data of 16 human body tissues and detected 3,100 switch-like exons. We found that switch-like exons tend to be more associated with Alu elements in their flanking intron regions than other types of exons.


Assuntos
Éxons/fisiologia , Sítios de Splice de RNA/fisiologia , Elementos Alu/genética , Éxons/genética , Humanos , Modelos Genéticos , Sítios de Splice de RNA/genética
20.
BMC Genomics ; 18(Suppl 1): 963, 2017 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-28198669

RESUMO

BACKGROUND: Alternative splicing plays important roles in many regulatory processes and diseases in human. Many genetic variants contribute to phenotypic differences in gene expression and splicing that determine variations in human traits. Detecting genetic variants that affect splicing phenotypes is essential for understanding the functional impact of genetic variations on alternative splicing. For many situations, the key phenotype is the relative splicing ratios of alternative isoforms rather than the expression values of individual isoforms. Splicing quantitative trait loci (sQTL) analysis methods have been proposed for detecting associations of genetic variants with the vectors of isoform splicing ratios of genes. We call this task as composite sQTL analysis. Existing methods are computationally intensive and cannot scale up for whole genome analysis. RESULTS: We developed an ultra-fast method named ulfasQTL for this task based on a previous method sQTLseekeR. It transforms tests of splicing ratios of multiple genes to a matrix form for efficient computation, and therefore can be applied for sQTL analysis at whole-genome scales at the speed thousands times faster than the existing method. We tested ulfasQTL on the data from the GEUVADIS project and compared it with an existing method. CONCLUSIONS: ulfasQTL is a very efficient tool for composite splicing QTL analysis and can be applied on whole-genome analysis with acceptable time.


Assuntos
Processamento Alternativo , Biologia Computacional/métodos , Locos de Características Quantitativas , Software , Algoritmos , Linhagem Celular Tumoral , Perfilação da Expressão Gênica , Variação Genética , Humanos , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA