Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Plant Phenomics ; 2020: 1969142, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33313545

RESUMEN

The local environment of the geographical origin of plants shaped their genetic variations through environmental adaptation. While the characteristics of the local environment correlate with the genotypes and other genomic features of the plants, they can also be indicative of genotype-phenotype associations providing additional information relevant to environmental dependence. In this study, we investigate how the geoclimatic features from the geographical origin of the Arabidopsis thaliana accessions can be integrated with genomic features for phenotype prediction and association analysis using advanced canonical correlation analysis (CCA). In particular, we propose a novel method called hierarchical canonical correlation analysis (HCCA) to combine mutations, gene expressions, and DNA methylations with geoclimatic features for informative coprojections of the features. HCCA uses a condition number of the cross-covariance between pairs of datasets to infer a hierarchical structure for applying CCA to combine the data. In the experiments on Arabidopsis thaliana data from 1001 Genomes and 1001 Epigenomes projects and climatic, atmospheric, and soil environmental variables combined by CLIMtools, HCCA provided a joint representation of the genomic data and geoclimate data for better prediction of the special flowering time at 10°C (FT10) of Arabidopsis thaliana. We also extended HCCA with information from a protein-protein interaction (PPI) network to guide the feature learning by imposing network modules onto the genomic features, which are shown to be useful for identifying genes with more coherent functions correlated with the geoclimatic features. The findings in this study suggest that environmental data comprise an important component in plant phenotype analysis. HCCA is a useful data integration technique for phenotype prediction, and a better understanding of the interactions between gene functions and environment as more useful functional information is introduced by coprojections of multiple genomic datasets.

2.
BMC Genomics ; 21(1): 272, 2020 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-32228441

RESUMEN

BACKGROUND: Most eukaryotic genes produce different transcripts of multiple isoforms by inclusion or exclusion of particular exons. The isoforms of a gene often play diverse functional roles, and thus it is necessary to accurately measure isoform expressions as well as gene expressions. While previous studies have demonstrated the strong agreement between mRNA sequencing (RNA-seq) and array-based gene and/or isoform quantification platforms (Microarray gene expression and Exon-array), the more recently developed NanoString platform has not been systematically evaluated and compared, especially in large-scale studies across different cancer domains. RESULTS: In this paper, we present a large-scale comparative study among RNA-seq, NanoString, array-based, and RT-qPCR platforms using 46 cancer cell lines across different cancer types. The goal is to understand and evaluate the calibers of the platforms for measuring gene and isoform expressions in cancer studies. We first performed NanoString experiments on 59 cancer cell lines with 404 custom-designed probes for measuring the expressions of 478 isoforms in 155 genes, and additional RT-qPCR experiments for a subset of the measured isoforms in 13 cell lines. We then combined the data with the matched RNA-seq, Exon-array, and Microarray data of 46 of the 59 cell lines for the comparative analysis. CONCLUSION: In the comparisons of the platforms for measuring the expressions at both isoform and gene levels, we found that (1) the agreement on isoform expressions is lower than the agreement on gene expressions across the four platforms; (2) NanoString and Exon-array are not consistent on isoform quantification even though both techniques are based on hybridization reactions; (3) RT-qPCR experiments are more consistent with RNA-seq and Exon-array than NanoString in isoform quantification; (4) different RNA-seq isoform quantification methods show varying estimation results, and among the methods, Net-RSTQ and eXpress are more consistent across the platforms; and (5) RNA-seq has the best overall consistency with the other platforms on gene expression quantification.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Algoritmos , Exones/genética , Exones/fisiología , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Análisis de Secuencia de ARN/métodos , Programas Informáticos
3.
Brief Bioinform ; 21(4): 1209-1223, 2020 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31243426

RESUMEN

Single-cell RNAsequencing (scRNA-seq) technologies have enabled the large-scale whole-transcriptome profiling of each individual single cell in a cell population. A core analysis of the scRNA-seq transcriptome profiles is to cluster the single cells to reveal cell subtypes and infer cell lineages based on the relations among the cells. This article reviews the machine learning and statistical methods for clustering scRNA-seq transcriptomes developed in the past few years. The review focuses on how conventional clustering techniques such as hierarchical clustering, graph-based clustering, mixture models, $k$-means, ensemble learning, neural networks and density-based clustering are modified or customized to tackle the unique challenges in scRNA-seq data analysis, such as the dropout of low-expression genes, low and uneven read coverage of transcripts, highly variable total mRNAs from single cells and ambiguous cell markers in the presence of technical biases and irrelevant confounding biological variations. We review how cell-specific normalization, the imputation of dropouts and dimension reduction methods can be applied with new statistical or optimization strategies to improve the clustering of single cells. We will also introduce those more advanced approaches to cluster scRNA-seq transcriptomes in time series data and multiple cell populations and to detect rare cell types. Several software packages developed to support the cluster analysis of scRNA-seq data are also reviewed and experimentally compared to evaluate their performance and efficiency. Finally, we conclude with useful observations and possible future directions in scRNA-seq data analytics. AVAILABILITY: All the source code and data are available at https://github.com/kuanglab/single-cell-review.


Asunto(s)
Aprendizaje Automático , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos
4.
Proteins ; 87(6): 478-491, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30714638

RESUMEN

The global connectivities in very large protein similarity networks contain traces of evolution among the proteins for detecting protein remote evolutionary relations or structural similarities. To investigate how well a protein network captures the evolutionary information, a key limitation is the intensive computation of pairwise sequence similarities needed to construct very large protein networks. In this article, we introduce label propagation on low-rank kernel approximation (LP-LOKA) for searching massively large protein networks. LP-LOKA propagates initial protein similarities in a low-rank graph by Nyström approximation without computing all pairwise similarities. With scalable parallel implementations based on distributed-memory using message-passing interface and Apache-Hadoop/Spark on cloud, LP-LOKA can search protein networks with one million proteins or more. In the experiments on Swiss-Prot/ADDA/CASP data, LP-LOKA significantly improved protein ranking over the widely used HMM-HMM or profile-sequence alignment methods utilizing large protein networks. It was observed that the larger the protein similarity network, the better the performance, especially on relatively small protein superfamilies and folds. The results suggest that computing massively large protein network is necessary to meet the growing need of annotating proteins from newly sequenced species and LP-LOKA is both scalable and accurate for searching massively large protein networks.


Asunto(s)
Proteínas/química , Algoritmos , Biología Computacional , Humanos , Análisis de Secuencia de Proteína , Programas Informáticos
5.
J Biol Chem ; 293(35): 13464-13476, 2018 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-30012885

RESUMEN

In obesity-linked insulin resistance, oxidative stress in adipocytes leads to lipid peroxidation and subsequent carbonylation of proteins by diffusible lipid electrophiles. Reduction in oxidative stress attenuates protein carbonylation and insulin resistance, suggesting that lipid modification of proteins may play a role in metabolic disease, but the mechanisms remain incompletely understood. Herein, we show that in vivo, diet-induced obesity in mice surprisingly results in preferential carbonylation of nuclear proteins by 4-hydroxy-trans-2,3-nonenal (4-HNE) or 4-hydroxy-trans-2,3-hexenal (4-HHE). Proteomic and structural analyses revealed that residues in or around the sites of zinc coordination of zinc finger proteins, such as those containing the C2H2 or MATRIN, RING, C3H1, or N4-type DNA-binding domains, are particularly susceptible to carbonylation by lipid aldehydes. These observations strongly suggest that carbonylation functionally disrupts protein secondary structure supported by metal coordination. Analysis of one such target, the nuclear protein estrogen-related receptor γ (ERR-γ), showed that ERR-γ is modified by 4-HHE in the obese state. In vitro carbonylation decreased the DNA-binding capacity of ERR-γ and correlated with the obesity-linked down-regulation of many key genes promoting mitochondrial bioenergetics. Taken together, these findings reveal a novel mechanistic connection between oxidative stress and metabolic dysfunction arising from carbonylation of nuclear zinc finger proteins, such as the transcriptional regulator ERR-γ.


Asunto(s)
Tejido Adiposo/metabolismo , Proteínas de Unión al ADN/metabolismo , Proteínas Nucleares/metabolismo , Obesidad/metabolismo , Carbonilación Proteica , Dedos de Zinc , Aldehídos/metabolismo , Secuencia de Aminoácidos , Animales , Núcleo Celular/metabolismo , Proteínas de Unión al ADN/química , Ratones , Proteínas Nucleares/química , Estrés Oxidativo
6.
PLoS Comput Biol ; 14(4): e1006053, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29630593

RESUMEN

Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC.


Asunto(s)
Epidermólisis Ampollosa Distrófica/genética , Algoritmos , Animales , Estudios de Casos y Controles , Análisis por Conglomerados , Colágeno Tipo VII/genética , Biología Computacional , Simulación por Computador , Células Madre Embrionarias/citología , Células Madre Embrionarias/metabolismo , Perfilación de la Expresión Génica/métodos , Marcadores Genéticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Leucocitos Mononucleares/citología , Leucocitos Mononucleares/metabolismo , Pulmón/citología , Pulmón/metabolismo , Aprendizaje Automático , Ratones , Modelos Genéticos , ARN/genética , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
7.
Bioinformatics ; 33(4): 529-536, 2017 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-27797759

RESUMEN

Motivation: To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype-gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term-gene associations and HPO phenotype-gene associations for all the genes in a protein-protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. Results: In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein-protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross-validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term-gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions. Availability and Implementation: Source code is available at http://compbio.cs.umn.edu/onto phenome . Contact: kuang@cs.umn.com. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Genoma , Modelos Genéticos , Fenotipo , Ontología de Genes , Humanos , Mapas de Interacción de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...