Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 173(7): 1562-1565, 2018 06 14.
Artículo en Inglés | MEDLINE | ID: mdl-29906441

RESUMEN

A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Algoritmos , Investigación Biomédica
2.
Immunity ; 54(6): 1304-1319.e9, 2021 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-34048708

RESUMEN

Despite mounting evidence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) engagement with immune cells, most express little, if any, of the canonical receptor of SARS-CoV-2, angiotensin-converting enzyme 2 (ACE2). Here, using a myeloid cell receptor-focused ectopic expression screen, we identified several C-type lectins (DC-SIGN, L-SIGN, LSECtin, ASGR1, and CLEC10A) and Tweety family member 2 (TTYH2) as glycan-dependent binding partners of the SARS-CoV-2 spike. Except for TTYH2, these molecules primarily interacted with spike via regions outside of the receptor-binding domain. Single-cell RNA sequencing analysis of pulmonary cells from individuals with coronavirus disease 2019 (COVID-19) indicated predominant expression of these molecules on myeloid cells. Although these receptors do not support active replication of SARS-CoV-2, their engagement with the virus induced robust proinflammatory responses in myeloid cells that correlated with COVID-19 severity. We also generated a bispecific anti-spike nanobody that not only blocked ACE2-mediated infection but also the myeloid receptor-mediated proinflammatory responses. Our findings suggest that SARS-CoV-2-myeloid receptor interactions promote immune hyperactivation, which represents potential targets for COVID-19 therapy.


Asunto(s)
COVID-19/metabolismo , COVID-19/virología , Interacciones Huésped-Patógeno , Lectinas Tipo C/metabolismo , Proteínas de la Membrana/metabolismo , Células Mieloides/inmunología , Células Mieloides/metabolismo , Proteínas de Neoplasias/metabolismo , SARS-CoV-2/fisiología , Enzima Convertidora de Angiotensina 2/metabolismo , Sitios de Unión , COVID-19/genética , Línea Celular , Citocinas , Regulación de la Expresión Génica , Interacciones Huésped-Patógeno/genética , Interacciones Huésped-Patógeno/inmunología , Humanos , Mediadores de Inflamación/metabolismo , Lectinas Tipo C/química , Proteínas de la Membrana/química , Modelos Moleculares , Proteínas de Neoplasias/química , Unión Proteica , Conformación Proteica , Anticuerpos de Dominio Único/inmunología , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/inmunología , Glicoproteína de la Espiga del Coronavirus/metabolismo , Relación Estructura-Actividad
3.
Nat Methods ; 2024 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-38844628

RESUMEN

Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the 'languages' of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named 'xTrimoscFoundationα', with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.

4.
Nature ; 600(7889): 536-542, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34819669

RESUMEN

The cell is a multi-scale structure with modular organization across at least four orders of magnitude1. Two central approaches for mapping this structure-protein fluorescent imaging and protein biophysical association-each generate extensive datasets, but of distinct qualities and resolutions that are typically treated separately2,3. Here we integrate immunofluorescence images in the Human Protein Atlas4 with affinity purifications in BioPlex5 to create a unified hierarchical map of human cell architecture. Integration is achieved by configuring each approach as a general measure of protein distance, then calibrating the two measures using machine learning. The map, known as the multi-scale integrated cell (MuSIC 1.0), resolves 69 subcellular systems, of which approximately half are to our knowledge undocumented. Accordingly, we perform 134 additional affinity purifications and validate subunit associations for the majority of systems. The map reveals a pre-ribosomal RNA processing assembly and accessory factors, which we show govern rRNA maturation, and functional roles for SRRM1 and FAM120C in chromatin and RPS3A in splicing. By integration across scales, MuSIC increases the resolution of imaging while giving protein interactions a spatial dimension, paving the way to incorporate diverse types of data in proteome-wide cell maps.


Asunto(s)
Cromosomas , Proteoma , Antígenos Nucleares/genética , Antígenos Nucleares/metabolismo , Cromatina/genética , Cromosomas/metabolismo , Humanos , Proteínas Asociadas a Matriz Nuclear/metabolismo , Proteoma/metabolismo , ARN Ribosómico , Proteínas de Unión al ARN/genética
5.
Proc Natl Acad Sci U S A ; 119(11): e2122954119, 2022 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-35238654

RESUMEN

SignificanceSARS-CoV-2 continues to evolve through emerging variants, more frequently observed with higher transmissibility. Despite the wide application of vaccines and antibodies, the selection pressure on the Spike protein may lead to further evolution of variants that include mutations that can evade immune response. To catch up with the virus's evolution, we introduced a deep learning approach to redesign the complementarity-determining regions (CDRs) to target multiple virus variants and obtained an antibody that broadly neutralizes SARS-CoV-2 variants.


Asunto(s)
Anticuerpos ampliamente neutralizantes/inmunología , COVID-19/inmunología , SARS-CoV-2/inmunología , Anticuerpos Neutralizantes/inmunología , Anticuerpos Antivirales/inmunología , Anticuerpos ampliamente neutralizantes/farmacología , Vacunas contra la COVID-19/inmunología , Regiones Determinantes de Complementariedad , Aprendizaje Profundo , Epítopos/inmunología , Humanos , Inmunoterapia/métodos , Pruebas de Neutralización/métodos , Dominios Proteicos , SARS-CoV-2/patogenicidad , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/inmunología
6.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36070863

RESUMEN

Computational recovery of gene regulatory network (GRN) has recently undergone a great shift from bulk-cell towards designing algorithms targeting single-cell data. In this work, we investigate whether the widely available bulk-cell data could be leveraged to assist the GRN predictions for single cells. We infer cell-type-specific GRNs from both the single-cell RNA sequencing data and the generic GRN derived from the bulk cells by constructing a weakly supervised learning framework based on the axial transformer. We verify our assumption that the bulk-cell transcriptomic data are a valuable resource, which could improve the prediction of single-cell GRN by conducting extensive experiments. Our GRN-transformer achieves the state-of-the-art prediction accuracy in comparison to existing supervised and unsupervised approaches. In addition, we show that our method can identify important transcription factors and potential regulations for Alzheimer's disease risk genes by using the predicted GRN. Availability: The implementation of GRN-transformer is available at https://github.com/HantaoShu/GRN-Transformer.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Algoritmos , Biología Computacional/métodos , Factores de Transcripción/genética , Transcriptoma
7.
Bioinformatics ; 39(4)2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-36975610

RESUMEN

MOTIVATION: We have entered the multi-omics era and can measure cells from different aspects. Hence, we can get a more comprehensive view by integrating or matching data from different spaces corresponding to the same object. However, it is particularly challenging in the single-cell multi-omics scenario because such data are very sparse with extremely high dimensions. Though some techniques can be used to measure scATAC-seq and scRNA-seq simultaneously, the data are usually highly noisy due to the limitations of the experimental environment. RESULTS: To promote single-cell multi-omics research, we overcome the above challenges, proposing a novel framework, contrastive cycle adversarial autoencoders, which can align and integrate single-cell RNA-seq data and single-cell ATAC-seq data. Con-AAE can efficiently map the above data with high sparsity and noise from different spaces to a coordinated subspace, where alignment and integration tasks can be easier. We demonstrate its advantages on several datasets. AVAILABILITY AND IMPLEMENTATION: Zenodo link: https://zenodo.org/badge/latestdoi/368779433. github: https://github.com/kakarotcq/Con-AAE.


Asunto(s)
Multiómica , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Secuenciación del Exoma , Análisis de Secuencia de ARN
8.
Bioinformatics ; 38(6): 1607-1614, 2022 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-34999749

RESUMEN

MOTIVATION: Rapidly generated scRNA-seq datasets enable us to understand cellular differences and the function of each individual cell at single-cell resolution. Cell-type classification, which aims at characterizing and labeling groups of cells according to their gene expression, is one of the most important steps for single-cell analysis. To facilitate the manual curation process, supervised learning methods have been used to automatically classify cells. Most of the existing supervised learning approaches only utilize annotated cells in the training step while ignoring the more abundant unannotated cells. In this article, we proposed scPretrain, a multi-task self-supervised learning approach that jointly considers annotated and unannotated cells for cell-type classification. scPretrain consists of a pre-training step and a fine-tuning step. In the pre-training step, scPretrain uses a multi-task learning framework to train a feature extraction encoder based on each dataset's pseudo-labels, where only unannotated cells are used. In the fine-tuning step, scPretrain fine-tunes this feature extraction encoder using the limited annotated cells in a new dataset. RESULTS: We evaluated scPretrain on 60 diverse datasets from different technologies, species and organs, and obtained a significant improvement on both cell-type classification and cell clustering. Moreover, the representations obtained by scPretrain in the pre-training step also enhanced the performance of conventional classifiers, such as random forest, logistic regression and support-vector machines. scPretrain is able to effectively utilize the massive amount of unlabeled data and be applied to annotating increasingly generated scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION: The data and code underlying this article are available in scPretrain: Multi-task self-supervised learning for cell type classification, at https://github.com/ruiyi-zhang/scPretrain and https://zenodo.org/record/5802306. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bosques Aleatorios , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Máquina de Vectores de Soporte
9.
Bioinformatics ; 37(Suppl_1): i254-i261, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252932

RESUMEN

MOTIVATION: The prediction of the binding between peptides and major histocompatibility complex (MHC) molecules plays an important role in neoantigen identification. Although a large number of computational methods have been developed to address this problem, they produce high false-positive rates in practical applications, since in most cases, a single residue mutation may largely alter the binding affinity of a peptide binding to MHC which cannot be identified by conventional deep learning methods. RESULTS: We developed a differential boundary tree-based model, named DBTpred, to address this problem. We demonstrated that DBTpred can accurately predict MHC class I binding affinity compared to the state-of-art deep learning methods. We also presented a parallel training algorithm to accelerate the training and inference process which enables DBTpred to be applied to large datasets. By investigating the statistical properties of differential boundary trees and the prediction paths to test samples, we revealed that DBTpred can provide an intuitive interpretation and possible hints in detecting important residue mutations that can largely influence binding affinity. AVAILABILITY AND IMPLEMENTATION: The DBTpred package is implemented in Python and freely available at: https://github.com/fpy94/DBT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Antígenos de Histocompatibilidad Clase I , Péptidos , Algoritmos , Antígenos de Histocompatibilidad Clase I/genética , Antígenos de Histocompatibilidad Clase I/metabolismo , Humanos , Complejo Mayor de Histocompatibilidad , Péptidos/metabolismo , Unión Proteica
10.
Bioinformatics ; 37(Suppl_1): i410-i417, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252957

RESUMEN

MOTIVATION: Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. RESULTS: In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. AVAILABILITY AND IMPLEMENTATION: Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.


Asunto(s)
Bases de Datos Genéticas , Redes Neurales de la Computación , Algoritmos , Humanos , Aprendizaje Automático
11.
Proc Natl Acad Sci U S A ; 116(28): 14011-14018, 2019 07 09.
Artículo en Inglés | MEDLINE | ID: mdl-31235599

RESUMEN

Three-dimensional genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe scHiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real single-cell Hi-C data as benchmarks, scHiCluster significantly improves clustering accuracy when applied to low coverage datasets compared with existing methods. After imputation by scHiCluster, topologically associating domain (TAD)-like structures (TLSs) can be identified within single cells, and their consensus boundaries were enriched at the TAD boundaries observed in bulk cell Hi-C samples. In summary, scHiCluster facilitates visualization and comparison of single-cell 3D genomes.


Asunto(s)
Cromatina/ultraestructura , Estructuras Cromosómicas/ultraestructura , Biología Computacional , Análisis de la Célula Individual , Algoritmos , Análisis por Conglomerados , Genoma/genética , Humanos , Conformación Molecular
12.
Sensors (Basel) ; 22(11)2022 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-35684708

RESUMEN

It is hard to directly deploy deep learning models on today's smartphones due to the substantial computational costs introduced by millions of parameters. To compress the model, we develop an ℓ0-based sparse group lasso model called MobilePrune which can generate extremely compact neural network models for both desktop and mobile platforms. We adopt group lasso penalty to enforce sparsity at the group level to benefit General Matrix Multiply (GEMM) and develop the very first algorithm that can optimize the ℓ0 norm in an exact manner and achieve the global convergence guarantee in the deep learning context. MobilePrune also allows complicated group structures to be applied on the group penalty (i.e., trees and overlapping groups) to suit DNN models with more complex architectures. Empirically, we observe the substantial reduction of compression ratio and computational costs for various popular deep learning models on multiple benchmark datasets compared to the state-of-the-art methods. More importantly, the compression models are deployed on the android system to confirm that our approach is able to achieve less response delay and battery consumption on mobile phones.


Asunto(s)
Compresión de Datos , Redes Neurales de la Computación , Algoritmos , Fenómenos Físicos
13.
Nat Methods ; 15(4): 290-298, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29505029

RESUMEN

Although artificial neural networks are powerful classifiers, their internal structures are hard to interpret. In the life sciences, extensive knowledge of cell biology provides an opportunity to design visible neural networks (VNNs) that couple the model's inner workings to those of real systems. Here we develop DCell, a VNN embedded in the hierarchical structure of 2,526 subsystems comprising a eukaryotic cell (http://d-cell.ucsd.edu/). Trained on several million genotypes, DCell simulates cellular growth nearly as accurately as laboratory observations. During simulation, genotypes induce patterns of subsystem activities, enabling in silico investigations of the molecular mechanisms underlying genotype-phenotype associations. These mechanisms can be validated, and many are unexpected; some are governed by Boolean logic. Cumulatively, 80% of the importance for growth prediction is captured by 484 subsystems (21%), reflecting the emergence of a complex phenotype. DCell provides a foundation for decoding the genetics of disease, drug resistance and synthetic life.


Asunto(s)
Fenómenos Fisiológicos Celulares , Aprendizaje Profundo , Redes Neurales de la Computación , Simulación por Computador , Regulación de la Expresión Génica , Genotipo , Humanos
14.
Bioinformatics ; 36(Suppl_1): i542-i550, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657383

RESUMEN

MOTIVATION: Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq), couples the measurement of surface marker proteins with simultaneous sequencing of mRNA at single cell level, which brings accurate cell surface phenotyping to single-cell transcriptomics. Unfortunately, multiplets in CITE-seq datasets create artificial cell types (ACT) and complicate the automation of cell surface phenotyping. RESULTS: We propose CITE-sort, an artificial-cell-type aware surface marker clustering method for CITE-seq. CITE-sort is aware of and is robust to multiplet-induced ACT. We benchmarked CITE-sort with real and simulated CITE-seq datasets and compared CITE-sort against canonical clustering methods. We show that CITE-sort produces the best clustering performance across the board. CITE-sort not only accurately identifies real biological cell types (BCT) but also consistently and reliably separates multiplet-induced artificial-cell-type droplet clusters from real BCT droplet clusters. In addition, CITE-sort organizes its clustering process with a binary tree, which facilitates easy interpretation and verification of its clustering result and simplifies cell-type annotation with domain knowledge in CITE-seq. AVAILABILITY AND IMPLEMENTATION: http://github.com/QiuyuLian/CITE-sort. SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis por Conglomerados , Epítopos , Análisis de Secuencia de ARN , Programas Informáticos
15.
Sensors (Basel) ; 21(1)2020 Dec 26.
Artículo en Inglés | MEDLINE | ID: mdl-33375324

RESUMEN

In this paper, we propose AirSign, a novel user authentication technology to provide users with more convenient, intuitive, and secure ways of interacting with smartphones in daily settings. AirSign leverages both acoustic and motion sensors for user authentication by signing signatures in the air through smartphones without requiring any special hardware. This technology actively transmits inaudible acoustic signals from the earpiece speaker, receives echoes back through both built-in microphones to "illuminate" signature and hand geometry, and authenticates users according to the unique features extracted from echoes and motion sensors. To evaluate our system, we collected registered, genuine, and forged signatures from 30 participants, and by applying AirSign on the above dataset, we were able to successfully distinguish between genuine and forged signatures with a 97.1% F-score while requesting only seven signatures during the registration phase.

16.
Bioinformatics ; 34(13): i484-i493, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29949979

RESUMEN

Motivation: Network propagation has been widely used to aggregate and amplify the effects of tumor mutations using knowledge of molecular interaction networks. However, propagating mutations through interactions irrelevant to cancer leads to erosion of pathway signals and complicates the identification of cancer subtypes. Results: To address this problem we introduce a propagation algorithm, Network-Based Supervised Stratification (NBS2), which learns the mutated subnetworks underlying tumor subtypes using a supervised approach. Given an annotated molecular network and reference tumor mutation profiles for which subtypes have been predefined, NBS2 is trained by adjusting the weights on interaction features such that network propagation best recovers the provided subtypes. After training, weights are fixed such that mutation profiles of new tumors can be accurately classified. We evaluate NBS2 on breast and glioblastoma tumors, demonstrating that it outperforms the best network-based approaches in classifying tumors to known subtypes for these diseases. By interpreting the interaction weights, we highlight characteristic molecular pathways driving selected subtypes. Availability and implementation: The NBS2 package is freely available at: https://github.com/wzhang1984/NBSS. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Mutación , Neoplasias/clasificación , Transducción de Señal , Aprendizaje Automático Supervisado , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Glioblastoma/clasificación , Glioblastoma/genética , Glioblastoma/metabolismo , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Mapas de Interacción de Proteínas , Programas Informáticos
17.
Bioinformatics ; 33(14): i267-i273, 2017 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-28881999

RESUMEN

MOTIVATION: Reconstructing the full-length expressed transcripts ( a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak. RESULTS: We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods. AVAILABILITY AND IMPLEMENTATION: DeepBound is freely available at https://github.com/realbigws/DeepBound . CONTACT: mingfu.shao@cs.cmu.edu or realbigws@gmail.com.


Asunto(s)
Empalme del ARN , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Área Bajo la Curva , Simulación por Computador , Exones , Humanos , Intrones , Modelos Genéticos
18.
Bioinformatics ; 32(17): i672-i679, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587688

RESUMEN

MOTIVATION: Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile. METHOD: This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence-structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data. RESULTS: Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others. AVAILABILITY AND IMPLEMENTATION: http://raptorx2.uchicago.edu/StructurePropertyPred/predict/ CONTACT: wangsheng@uchicago.edu, jinboxu@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Proteoma , Área Bajo la Curva , Predicción , Análisis de Secuencia de ADN
19.
Bioinformatics ; 32(17): i658-i664, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587686

RESUMEN

MOTIVATION: As an increasing amount of protein-protein interaction (PPI) data becomes available, their computational interpretation has become an important problem in bioinformatics. The alignment of PPI networks from different species provides valuable information about conserved subnetworks, evolutionary pathways and functional orthologs. Although several methods have been proposed for global network alignment, there is a pressing need for methods that produce more accurate alignments in terms of both topological and functional consistency. RESULTS: In this work, we present a novel global network alignment algorithm, named ModuleAlign, which makes use of local topology information to define a module-based homology score. Based on a hierarchical clustering of functionally coherent proteins involved in the same module, ModuleAlign employs a novel iterative scheme to find the alignment between two networks. Evaluated on a diverse set of benchmarks, ModuleAlign outperforms state-of-the-art methods in producing functionally consistent alignments. By aligning Pathogen-Human PPI networks, ModuleAlign also detects a novel set of conserved human genes that pathogens preferentially target to cause pathogenesis. AVAILABILITY: http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html CONTACT: canzar@ttic.edu or j3xu.ttic.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Humanos , Proteínas , Programas Informáticos
20.
Bioinformatics ; 31(21): 3506-13, 2015 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-26275894

RESUMEN

MOTIVATION: Protein contact prediction is important for protein structure and functional study. Both evolutionary coupling (EC) analysis and supervised machine learning methods have been developed, making use of different information sources. However, contact prediction is still challenging especially for proteins without a large number of sequence homologs. RESULTS: This article presents a group graphical lasso (GGL) method for contact prediction that integrates joint multi-family EC analysis and supervised learning to improve accuracy on proteins without many sequence homologs. Different from existing single-family EC analysis that uses residue coevolution information in only the target protein family, our joint EC analysis uses residue coevolution in both the target family and its related families, which may have divergent sequences but similar folds. To implement this, we model a set of related protein families using Gaussian graphical models and then coestimate their parameters by maximum-likelihood, subject to the constraint that these parameters shall be similar to some degree. Our GGL method can also integrate supervised learning methods to further improve accuracy. Experiments show that our method outperforms existing methods on proteins without thousands of sequence homologs, and that our method performs better on both conserved and family-specific contacts. AVAILABILITY AND IMPLEMENTATION: See http://raptorx.uchicago.edu/ContactMap/ for a web server implementing the method. CONTACT: j3xu@ttic.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Proteínas/química , Análisis de Secuencia de Proteína , Algoritmos , Evolución Molecular , Modelos Estadísticos , Conformación Proteica , Proteínas/genética , Alineación de Secuencia , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA