RESUMEN
Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.
Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados , Expresión GénicaRESUMEN
Alternative splicing (AS) is a key transcriptional regulation pathway. Recent studies have shown that AS events are associated with the occurrence of complex diseases. Various computational approaches have been developed for the detection of disease-associated AS events. In this review, we first describe the metrics used for quantitative characterization of AS events. Second, we review and discuss the three types of methods for detecting disease-associated splicing events, which are differential splicing analysis, aberrant splicing detection and splicing-related network analysis. Third, to further exploit the genetic mechanism of disease-associated AS events, we describe the methods for detecting genetic variants that potentially regulate splicing. For each type of methods, we conducted experimental comparison to illustrate their performance. Finally, we discuss the limitations of these methods and point out potential ways to address them. We anticipate that this review provides a systematic understanding of computational approaches for the analysis of disease-associated splicing.
Asunto(s)
Empalme Alternativo , Biología ComputacionalRESUMEN
MOTIVATION: Identifying cancer genes remains a significant challenge in cancer genomics research. Annotated gene sets encode functional associations among multiple genes, and cancer genes have been shown to cluster in hallmark signaling pathways and biological processes. The knowledge of annotated gene sets is critical for discovering cancer genes but remains to be fully exploited. RESULTS: Here, we present the DIsease-Specific Hypergraph neural network (DISHyper), a hypergraph-based computational method that integrates the knowledge from multiple types of annotated gene sets to predict cancer genes. First, our benchmark results demonstrate that DISHyper outperforms the existing state-of-the-art methods and highlight the advantages of employing hypergraphs for representing annotated gene sets. Second, we validate the accuracy of DISHyper-predicted cancer genes using functional validation results and multiple independent functional genomics data. Third, our model predicts 44 novel cancer genes, and subsequent analysis shows their significant associations with multiple types of cancers. Overall, our study provides a new perspective for discovering cancer genes and reveals previously undiscovered cancer genes. AVAILABILITY AND IMPLEMENTATION: DISHyper is freely available for download at https://github.com/genemine/DISHyper.
Asunto(s)
Neoplasias , Redes Neurales de la Computación , Humanos , Neoplasias/genética , Biología Computacional/métodos , Genómica/métodos , Genes Relacionados con las Neoplasias , Anotación de Secuencia Molecular/métodos , Bases de Datos GenéticasRESUMEN
Alzheimer's disease (AD) has a strong genetic predisposition. However, its risk genes remain incompletely identified. We developed an Alzheimer's brain gene network-based approach to predict AD-associated genes by leveraging the functional pattern of known AD-associated genes. Our constructed network outperformed existing networks in predicting AD genes. We then systematically validated the predictions using independent genetic, transcriptomic, proteomic data, neuropathological and clinical data. First, top-ranked genes were enriched in AD-associated pathways. Second, using external gene expression data from the Mount Sinai Brain Bank study, we found that the top-ranked genes were significantly associated with neuropathological and clinical traits, including the Consortium to Establish a Registry for Alzheimer's Disease score, Braak stage score and clinical dementia rating. The analysis of Alzheimer's brain single-cell RNA-seq data revealed cell-type-specific association of predicted genes with early pathology of AD. Third, by interrogating proteomic data in the Religious Orders Study and Memory and Aging Project and Baltimore Longitudinal Study of Aging studies, we observed a significant association of protein expression level with cognitive function and AD clinical severity. The network, method and predictions could become a valuable resource to advance the identification of risk genes for AD.
Asunto(s)
Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Envejecimiento/genética , Perfilación de la Expresión Génica , Humanos , Estudios Longitudinales , Memoria , Proteómica , RNA-Seq , TranscriptomaRESUMEN
MOTIVATION: A single gene may yield several isoforms with different functions through alternative splicing. Continuous efforts are devoted to developing machine-learning methods to predict isoform functions. However, existing methods do not consider the relevance of each feature to specific functions and ignore the noise caused by the irrelevant features. In this case, we hypothesize that constructing a feature selection framework to extract the function-relevant features might help improve the model accuracy in isoform function prediction. RESULTS: In this article, we present a feature selection-based approach named IsoFrog to predict isoform functions. First, IsoFrog adopts a reversible jump Markov Chain Monte Carlo (RJMCMC)-based feature selection framework to assess the feature importance to gene functions. Second, a sequential feature selection procedure is applied to select a subset of function-relevant features. This strategy screens the relevant features for the specific function while eliminating irrelevant ones, improving the effectiveness of the input features. Then, the selected features are input into our proposed method modified domain-invariant partial least squares, which prioritizes the most likely positive isoform for each positive MIG and utilizes diPLS for isoform function prediction. Tested on three datasets, our method achieves superior performance over six state-of-the-art methods, and the RJMCMC-based feature selection framework outperforms three classic feature selection methods. We expect this proposed methodology will promote the identification of isoform functions and further inspire the development of new methods. AVAILABILITY AND IMPLEMENTATION: IsoFrog is freely available at https://github.com/genemine/IsoFrog.
Asunto(s)
Empalme Alternativo , Aprendizaje Automático , Cadenas de Markov , Isoformas de Proteínas , Método de MontecarloRESUMEN
MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering. RESULTS: We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy. AVAILABILITY AND IMPLEMENTATION: All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.
Asunto(s)
Benchmarking , Bosques Aleatorios , Diferenciación Celular , Análisis por ConglomeradosRESUMEN
OBJECTIVES: Idiopathic inflammatory myopathies (IIMs) are a group of heterogeneous autoimmune diseases. Intron retention (IR) serves as an important post-transcriptional and translational regulatory mechanism. This study aims to identify changes in IR profiles in IIM subtypes, investigating their influence on proteins and their correlations with clinical features. METHODS: RNA sequencing and liquid chromatography-tandem mass spectrometry were performed on muscle tissues obtained from 174 patients with IIM and 19 controls, following QC procedures. GTFtools and iREAD software were used for IR identification. An analysis of differentially expressed IRs (DEIs), exons and proteins was carried out using edgeR or DEP. Functional analysis was performed with clusterProfiler, and SPIRON was used to assess splicing factors. RESULTS: A total of 6783 IRs located in 3111 unique genes were identified in all IIM subtypes compared with controls. IIM subtype-specific DEIs were associated with the pathogenesis of respective IIM subtypes. Splicing factors YBX1 and HSPA2 exhibited the most changes in dermatomyositis and immune-mediated necrotising myopathy. Increased IR was associated with reduced protein expression. Some of the IIM-specific DEIs were correlated with clinical parameters (skin rash, MMT-8 scores and muscle enzymes) and muscle histopathological features (myofiber necrosis, regeneration and inflammation). IRs in IFIH1 and TRIM21 were strongly correlated with anti-MDA5+ antibody, while IRs in SRP14 were associated with anti-SRP+ antibody. CONCLUSION: This study revealed distinct IRs and specific splicing factors associated with IIM subtypes, which might be contributing to the pathogenesis of IIM. We also emphasised the potential impact of IR on protein expression in IIM muscles.
Asunto(s)
Intrones , Músculo Esquelético , Miositis , Humanos , Miositis/genética , Miositis/inmunología , Miositis/patología , Masculino , Femenino , Músculo Esquelético/patología , Músculo Esquelético/metabolismo , Persona de Mediana Edad , Intrones/genética , Adulto , Dermatomiositis/genética , Dermatomiositis/patología , Dermatomiositis/metabolismo , Dermatomiositis/inmunología , Estudios de Casos y Controles , Anciano , Análisis de Secuencia de ARNRESUMEN
Mapping gene interactions within tissues/cell types plays a crucial role in understanding the genetic basis of human physiology and disease. Tissue functional gene networks (FGNs) are essential models for mapping complex gene interactions. We present TissueNexus, a database of 49 human tissue/cell line FGNs constructed by integrating heterogeneous genomic data. We adopted an advanced machine learning approach for data integration because Bayesian classifiers, which is the main approach used for constructing existing tissue gene networks, cannot capture the interaction and nonlinearity of genomic features well. A total of 1,341 RNA-seq datasets containing 52,087 samples were integrated for all of these networks. Because the tissue label for RNA-seq data may be annotated with different names or be missing, we performed intensive hand-curation to improve quality. We further developed a user-friendly database for network search, visualization, and functional analysis. We illustrate the application of TissueNexus in prioritizing disease genes. The database is publicly available at https://www.diseaselinks.com/TissueNexus/.
Asunto(s)
Bases de Datos Genéticas , Redes Reguladoras de Genes/genética , Especificidad de Órganos/genética , RNA-Seq , Curaduría de Datos , Manejo de Datos , Genoma Humano/genética , Humanos , Programas InformáticosRESUMEN
Advances in sequencing technologies facilitate personalized disease-risk profiling and clinical diagnosis. In recent years, some great progress has been made in noninvasive diagnoses based on cell-free DNAs (cfDNAs). It exploits the fact that dead cells release DNA fragments into the circulation, and some DNA fragments carry information that indicates their tissues-of-origin (TOOs). Based on the signals used for identifying the TOOs of cfDNAs, the existing methods can be classified into three categories: cfDNA mutation-based methods, methylation pattern-based methods and cfDNA fragmentation pattern-based methods. In cfDNA mutation-based methods, the SNP information or the detected mutations in driven genes of certain diseases are employed to identify the TOOs of cfDNAs. Methylation pattern-based methods are developed to identify the TOOs of cfDNAs based on the tissue-specific methylation patterns. In cfDNA fragmentation pattern-based methods, cfDNA fragmentation patterns, such as nucleosome positioning or preferred end coordinates of cfDNAs, are used to predict the TOOs of cfDNAs. In this paper, the strategies and challenges in each category are reviewed. Furthermore, the representative applications based on the TOOs of cfDNAs, including noninvasive prenatal testing, noninvasive cancer screening, transplantation rejection monitoring and parasitic infection detection, are also reviewed. Moreover, the challenges and future work in identifying the TOOs of cfDNAs are discussed. Our research provides a comprehensive picture of the development and challenges in identifying the TOOs of cfDNAs, which may benefit bioinformatics researchers to develop new methods to improve the identification of the TOOs of cfDNAs.
Asunto(s)
Ácidos Nucleicos Libres de Células/genética , Neoplasias/diagnóstico , Biomarcadores de Tumor/genética , Ácidos Nucleicos Libres de Células/sangre , Metilación de ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Mutación , Neoplasias/genéticaRESUMEN
In single-cell RNA-seq (scRNA-seq) data analysis, a fundamental problem is to determine the number of cell clusters based on the gene expression profiles. However, the performance of current methods is still far from satisfactory, presumably due to their limitations in capturing the expression variability among cell clusters. Batch effects represent the undesired variability between data measured in different batches. When data are obtained from different labs or protocols batch effects occur. Motivated by the practice of batch effect removal, we considered cell clusters as batches. We hypothesized that the number of cell clusters (i.e. batches) could be correctly determined if the variances among clusters (i.e. batch effects) were removed. We developed a new method, namely, removal of batch effect and testing (REBET), for determining the number of cell clusters. In this method, cells are first partitioned into k clusters. Second, the batch effects among these k clusters are then removed. Third, the quality of batch effect removal is evaluated with the average range of normalized mutual information (ARNMI), which measures how uniformly the cells with batch-effects-removal are mixed. By testing a range of k values, the k value that corresponds to the lowest ARNMI is determined to be the optimal number of clusters. We compared REBET with state-of-the-art methods on 32 simulated datasets and 14 published scRNA-seq datasets. The results show that REBET can accurately and robustly estimate the number of cell clusters and outperform existing methods. Contact: H.D.L. (hongdong@csu.edu.cn) or Q.S.X. (qsxu@csu.edu.cn).
Asunto(s)
Análisis por Conglomerados , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Bases de Datos Genéticas , Reproducibilidad de los ResultadosRESUMEN
MOTIVATION: Gene-centric bioinformatics studies frequently involve the calculation or the extraction of various features of genes such as splice sites, promoters, independent introns and untranslated regions (UTRs) through manipulation of gene models. Gene models are often annotated in gene transfer format (GTF) files. The features are essential for subsequent analysis such as intron retention detection, DNA-binding site identification and computing splicing strength of splice sites. Some features such as independent introns and splice sites are not provided in existing resources including the commonly used BioMart database. A package that implements and integrates functions to analyze various features of genes will greatly ease routine analysis for related bioinformatics studies. However, to the best of our knowledge, such a package is not available yet. RESULTS: We introduce GTFtools, a stand-alone command-line software that provides a set of functions to calculate various gene features, including splice sites, independent introns, transcription start sites (TSS)-flanking regions, UTRs, isoform coordination and length, different types of gene lengths, etc. It takes the ENSEMBL or GENCODE GTF files as input and can be applied to both human and non-human gene models like the lab mouse. We compare the utilities of GTFtools with those of two related tools: Bedtools and BioMart. GTFtools is implemented in Python and not dependent on any third-party software, making it very easy to install and use. AVAILABILITY AND IMPLEMENTATION: GTFtools is freely available at www.genemine.org/gtftools.php as well as pyPI and Bioconda.
Asunto(s)
Biología Computacional , Programas Informáticos , ADN , Intrones , Regiones no TraducidasRESUMEN
MOTIVATION: Alzheimer's disease (AD) is a complex brain disorder with risk genes incompletely identified. The candidate genes are dominantly obtained by computational approaches. In order to obtain biological insights of candidate genes or screen genes for experimental testing, it is essential to assess their relevance to AD. A platform that integrates different types of omics data and approaches would facilitate the analysis of candidate genes and is in great need. RESULTS: We report AlzCode, a platform for multiview analysis of genes related to AD. First, this platform integrates a rich collection of functional genomic data, including expression data of AD samples (gene expression, single-cell RNA-seq data and protein expression), AD-specific biological networks (co-expression networks and functional gene networks), neuropathological and clinical traits (CERAD score, Braak staging score, Clinical Dementia Rating, cognitive function and clinical severity) and general data such as protein-protein interaction, regulatory networks, sequence similarity and miRNA-target interactions. These data provide basis for analyzing genes from different views. Second, the platform integrates multiple approaches designed for the various types of data. We implement functions to analyze both individual genes and gene sets. We also compare AlzCode with two existing platforms for AD analysis, which are Agora and AD Atlas. We pinpoint the features of each platform and highlight their differences. This platform would be valuable to the understanding of AD genetics and pathological mechanisms. AVAILABILITY AND IMPLEMENTATION: AlzCode is freely available at: http://www.alzcode.xyz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/genética , Redes Reguladoras de Genes , GenómicaRESUMEN
Alternative splicing (AS) leads to transcriptome diversity in eukaryotic cells and is one of the key regulators driving cellular differentiation. Although AS is of crucial importance for normal hematopoiesis and hematopoietic malignancies, its role in early hematopoietic development is still largely unknown. Here, by using high-throughput transcriptomic analyses, we show that pervasive and dynamic AS takes place during hematopoietic development of human pluripotent stem cells (hPSCs). We identify a splicing factor switch that occurs during the differentiation of mesodermal cells to endothelial progenitor cells (EPCs). Perturbation of this switch selectively impairs the emergence of EPCs and hemogenic endothelial progenitor cells (HEPs). Mechanistically, an EPC-induced alternative spliced isoform of NUMB dictates EPC specification by controlling NOTCH signaling. Furthermore, we demonstrate that the splicing factor SRSF2 regulates splicing of the EPC-induced NUMB isoform, and the SRSF2-NUMB-NOTCH splicing axis regulates EPC generation. The identification of this splicing factor switch provides a new molecular mechanism to control cell fate and lineage specification.
Asunto(s)
Linaje de la Célula , Células Madre Pluripotentes , Factores de Empalme Serina-Arginina/genética , Diferenciación Celular , Linaje de la Célula/genética , Hematopoyesis/genética , Células Madre Hematopoyéticas , Humanos , Proteínas de la Membrana , Proteínas del Tejido NerviosoRESUMEN
The main purpose of this study was to reveal the nutritional value and antioxidant activity of 34 edible flowers that grew in Yunnan Province, China, through a comprehensive assessment of their nutritional composition and antioxidant indices. The results showed that sample A3 of Asteraceae flowers had the highest total flavonoid content, with a value of 8.53%, and the maximum contents of vitamin C and reducing sugars were from Rosaceae sample R1 and Gentianaceae sample G3, with values of 143.80 mg/100 g and 7.82%, respectively. Samples R2 and R3 of Rosaceae were the top two flowers in terms of comprehensive nutritional quality. In addition, the antioxidant capacity of Rosaceae samples was evidently better than that of three others, in which Sample R1 had the maximum values in hydroxyl radical (·OH) scavenging and superoxide anion radical (·O2-) scavenging rates, and samples R2 and R3 showed a high total antioxidant capacity and 2,2-diphenyl-1-pyridylhydrazine (DPPH) scavenging rate, respectively. Taken together, there were significant differences in the nutrient contents and antioxidant properties of these 34 flowers, and the comprehensive quality of Rosaceae samples was generally better than the other three families. This study provides references for 34 edible flowers to be used as dietary supplements and important sources of natural antioxidants.
Asunto(s)
Antioxidantes , Fenoles , Humanos , Antioxidantes/química , Fenoles/química , China , Flores/química , Flavonoides/química , Extractos Vegetales/químicaRESUMEN
MOTIVATION: High resolution annotation of gene functions is a central goal in functional genomics. A single gene may produce multiple isoforms with different functions through alternative splicing. Conventional approaches, however, consider a gene as a single entity without differentiating these functionally different isoforms. Towards understanding gene functions at higher resolution, recent efforts have focused on predicting the functions of isoforms. However, the performance of existing methods is far from satisfactory mainly because of the lack of isoform-level functional annotation. RESULTS: We present IsoResolve, a novel approach for isoform function prediction, which leverages the information from gene function prediction models with domain adaptation (DA). IsoResolve treats gene-level and isoform-level features as source and target domains, respectively. It uses DA to project the two domains into a latent variable space in such a way that the latent variables from the two domains have similar distribution, which enables the gene domain information to be leveraged for isoform function prediction. We systematically evaluated the performance of IsoResolve in predicting functions. Compared with five state-of-the-art methods, IsoResolve achieved significantly better performance. IsoResolve was further validated by case studies of genes with isoform-level functional annotation. AVAILABILITY AND IMPLEMENTATION: IsoResolve is freely available at https://github.com/genemine/IsoResolve. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Adaptación Fisiológica , Empalme Alternativo , Biología Computacional , Mutación , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismoRESUMEN
Intron retention (IR) has been implicated in the pathogenesis of complex diseases such as cancers; its association with Alzheimer's disease (AD) remains unexplored. We performed genome-wide analysis of IR through integrating genetic, transcriptomic, and proteomic data of AD subjects and mouse models from the Accelerating Medicines Partnership-Alzheimer's Disease project. We identified 4535 and 4086 IR events in 2173 human and 1736 mouse genes, respectively. Quantitation of IR enabled the identification of differentially expressed genes that conventional exon-level approaches did not reveal. There were significant correlations of intron expression within innate immune genes, like HMBOX1, with AD in humans. Peptides with a high probability of translation from intron-retained mRNAs were identified using mass spectrometry. Further, we established AD-specific intron expression Quantitative Trait Loci, and identified splicing-related genes that may regulate IR. Our analysis provides a novel resource for the search for new AD biomarkers and pathological mechanisms.
Asunto(s)
Enfermedad de Alzheimer , Autopsia , Encéfalo/patología , Modelos Animales de Enfermedad , Genómica , Intrones/genética , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/patología , Animales , Proteínas de Homeodominio/genética , Humanos , Ratones , Proteómica , Sitios de Carácter Cuantitativo , TranscriptomaRESUMEN
Congenital cardiac malformations are among the most common birth defects in humans. Here we show that Trim33, a member of the Tif1 subfamily of tripartite domain containing transcriptional cofactors, is required for appropriate differentiation of the pre-cardiogenic mesoderm during a narrow time window in late gastrulation. While mesoderm-specific Trim33 mutants did not display noticeable phenotypes, epiblast-specific Trim33 mutant embryos developed ventricular septal defects, showed sparse trabeculation and abnormally thin compact myocardium, and died as a result of cardiac failure during late gestation. Differentiating embryoid bodies deficient in Trim33 showed an enrichment of gene sets associated with cardiac differentiation and contractility, while the total number of cardiac precursor cells was reduced. Concordantly, cardiac progenitor cell proliferation was reduced in Trim33-deficient embryos. ChIP-Seq performed using antibodies against Trim33 in differentiating embryoid bodies revealed more than 4000 peaks, which were significantly enriched close to genes implicated in stem cell maintenance and mesoderm development. Nearly half of the Trim33 peaks overlapped with binding sites of the Ctcf insulator protein. Our results suggest that Trim33 is required for appropriate differentiation of precardiogenic mesoderm during late gastrulation and that it will likely mediate some of its functions via multi-protein complexes, many of which include the chromatin architectural and insulator protein Ctcf.
Asunto(s)
Embrión de Mamíferos/embriología , Gastrulación , Mesodermo/embriología , Miocardio/metabolismo , Células Madre/metabolismo , Factores de Transcripción/metabolismo , Animales , Embrión de Mamíferos/citología , Cuerpos Embrioides/citología , Cuerpos Embrioides/metabolismo , Mesodermo/citología , Ratones , Ratones Transgénicos , Células Madre/citología , Factores de Transcripción/genéticaRESUMEN
BACKGROUND: Intron retention (IR) has been traditionally overlooked as 'noise' and received negligible attention in the field of gene expression analysis. In recent years, IR has become an emerging field for interrogating transcriptomes because it has been recognized to carry out important biological functions such as gene expression regulation and it has been found to be associated with complex diseases such as cancers. However, methods for detecting IR today are limited. Thus, there is a need to develop novel methods to improve IR detection. RESULTS: Here we present iREAD (intron REtention Analysis and Detector), a tool to detect IR events genome-wide from high-throughput RNA-seq data. The command line interface for iREAD is implemented in Python. iREAD takes as input a BAM file, representing the transcriptome, and a text file containing the intron coordinates of a genome. It then 1) counts all reads that overlap intron regions, 2) detects IR events by analyzing the features of reads such as depth and distribution patterns, and 3) outputs a list of retained introns into a tab-delimited text file. iREAD provides significant added value in detecting IR compared with output from IRFinder with a higher AUC on all datasets tested. Both methods showed low false positive rates and high false negative rates in different regimes, indicating that use together is generally beneficial. The output from iREAD can be directly used for further exploratory analysis such as differential intron expression and functional enrichment. The software is freely available at https://github.com/genemine/iread. CONCLUSION: Being complementary to existing tools, iREAD provides a new and generic tool to interrogate poly-A enriched transcriptomic data of intron regions. Intron retention analysis provides a complementary approach for understanding transcriptome.
Asunto(s)
Intrones , RNA-Seq , Programas Informáticos , Algoritmos , Animales , Humanos , RatonesRESUMEN
MOTIVATION: Functional gene networks, representing how likely two genes work in the same biological process, are important models for studying gene interactions in complex tissues. However, a limitation of the current network-building scheme is the lack of leveraging evidence from multiple model organisms as well as the lack of expert curation and quality control of the input genomic data. RESULTS: Here, we present BaiHui, a brain-specific functional gene network built by probabilistically integrating expertly-hand-curated (by reading original publications) heterogeneous and multi-species genomic data in human, mouse and rat brains. To facilitate the use of this network, we deployed a web server through which users can query their genes of interest, visualize the network, gain functional insight from enrichment analysis and download network data. We also illustrated how this network could be used to generate testable hypotheses on disease gene prioritization of brain disorders. AVAILABILITY AND IMPLEMENTATION: BaiHui is freely available at: http://guanlab.ccmb.med.umich.edu/BaiHui/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Redes Reguladoras de Genes , Genómica , Animales , Encéfalo , Humanos , Ratones , Programas InformáticosRESUMEN
BACKGROUND: Underdevelopment of nose and chin in East Asians is quite common. Rhinoplasty and mentoplasty are effective procedures to solve the above-depicted defects and can achieve remarkable cosmetic effects. An autologous costal cartilage graft has become an ideal material for rhinoplasty, especially for revision surgery. However, many problems in the clinical application of costal cartilage remain unresolved. This study is to investigate application strategies of autologous costal cartilage grafts in rhino- and mentoplasty. METHODS: The methods involved are as follows: application of an integrated cartilage scaffold; comprehensive application of diced cartilage; and chin augmentation of an autologous costal cartilage graft. RESULTS: In this study, satisfactory facial contour appearance was immediately achieved in 28 patients after surgery; 21 patients had satisfactory appearance of the nose and chin during the 6- to 18-month follow-up. Cartilage resorption was not observed. Two patients had nasal tip skin redness and were cured after treatment. CONCLUSION: This procedure can be used to effectively solve: curvature of the costal cartilage segment itself; warping of the carved costal cartilage; and effective use of the costal cartilage segment. The procedure has achieved satisfactory outcomes, and its application is worth extending to clinical practice.