Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Brief Bioinform ; 18(1): 9-27, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-26839320

RESUMEN

Since the completion of the Human Genome Project, it has been widely established that most DNA is not transcribed into proteins. These non-protein-coding regions are believed to be moderators within transcriptional and post-transcriptional processes, which play key roles in the onset of diseases. Long non-coding RNAs (lncRNAs) are generally lacking in conserved motifs typically used for detection and thus hard to identify, but nonetheless present certain characteristic features that can be exploited by bioinformatics methods. By combining lncRNA detection with known miRNA, RNA-binding protein and chromatin interaction, current tools are able to recognize and functionally annotate large number of lncRNAs. This review discusses databases and platforms dedicated to cataloging and annotating lncRNAs, as well as tools geared at discovering novel sequences. We emphasize the issues posed by the diversity of lncRNAs and their complex interaction mechanisms, as well as technical issues such as lack of unified nomenclature. We hope that this wide overview of existing platforms and databases might help guide biologists toward the tools they need to analyze their experimental data, while our discussion of limitations and of current lncRNA-related methods may assist in the development of new computational tools.


Asunto(s)
ARN Largo no Codificante/genética , Biología Computacional , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Humanos , Programas Informáticos
2.
Mol Cell Proteomics ; 15(4): 1262-80, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26796116

RESUMEN

Calpains are intracellular Ca(2+)-regulated cysteine proteases that are essential for various cellular functions. Mammalian conventional calpains (calpain-1 and calpain-2) modulate the structure and function of their substrates by limited proteolysis. Thus, it is critically important to determine the site(s) in proteins at which calpains cleave. However, the calpains' substrate specificity remains unclear, because the amino acid (aa) sequences around their cleavage sites are very diverse. To clarify calpains' substrate specificities, 84 20-mer oligopeptides, corresponding to P10-P10' of reported cleavage site sequences, were proteolyzed by calpains, and the catalytic efficiencies (kcat/Km) were globally determined by LC/MS. This analysis revealed 483 cleavage site sequences, including 360 novel ones. Thekcat/Kms for 119 sites ranged from 12.5-1,710 M(-1)s(-1) Although most sites were cleaved by both calpain-1 and -2 with a similarkcat/Km, sequence comparisons revealed distinct aa preferences at P9-P7/P2/P5'. The aa compositions of the novel sites were not statistically different from those of previously reported sites as a whole, suggesting calpains have a strict implicit rule for sequence specificity, and that the limited proteolysis of intact substrates is because of substrates' higher-order structures. Cleavage position frequencies indicated that longer sequences N-terminal to the cleavage site (P-sites) were preferred for proteolysis over C-terminal (P'-sites). Quantitative structure-activity relationship (QSAR) analyses using partial least-squares regression and >1,300 aa descriptors achievedkcat/Kmprediction withr= 0.834, and binary-QSAR modeling attained an 87.5% positive prediction value for 132 reported calpain cleavage sites independent of our model construction. These results outperformed previous calpain cleavage predictors, and revealed the importance of the P2, P3', and P4' sites, and P1-P2 cooperativity. Furthermore, using our binary-QSAR model, novel cleavage sites in myoglobin were identified, verifying our predictor. This study increases our understanding of calpain substrate specificities, and opens calpains to "next-generation,"i.e.activity-related quantitative and cooperativity-dependent analyses.


Asunto(s)
Calpaína/química , Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Oligopéptidos/química , Oligopéptidos/metabolismo , Secuencia de Aminoácidos , Animales , Sitios de Unión , Catálisis , Humanos , Modelos Moleculares , Proteolisis , Relación Estructura-Actividad Cuantitativa , Especificidad por Sustrato
3.
BMC Bioinformatics ; 17(1): 363, 2016 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-27620863

RESUMEN

BACKGROUND: Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. RESULTS: Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. CONCLUSIONS: With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .


Asunto(s)
ARN/genética , Análisis de Secuencia de ARN/métodos , Células Madre/inmunología , Diferenciación Celular , Humanos
4.
Brief Bioinform ; 13(3): 337-49, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22138323

RESUMEN

A fundamental component of systems biology, proteolytic cleavage is involved in nearly all aspects of cellular activities: from gene regulation to cell lifecycle regulation. Current sequencing technologies have made it possible to compile large amount of cleavage data and brought greater understanding of the underlying protein interactions. However, the practical impossibility to exhaustively retrieve substrate sequences through experimentation alone has long highlighted the need for efficient computational prediction methods. Such methods must be able to quickly mark substrate candidates and putative cleavage sites for further analysis. Available methods and expected reliability depend heavily on the type and complexity of proteolytic action, as well as the availability of well-labelled experimental data sets: factors varying greatly across enzyme families. For this review, we chose to give a quick overview of the general issues and challenges in cleavage prediction methods followed by a more in-depth presentation of major techniques and implementations, with a focus on two particular families of cysteine proteases: caspases and calpains. Through their respective differences in proteolytic specificity (high for caspases, broader for calpains) and data availability (much lower for calpains), we aimed to illustrate the strengths and limitations of techniques ranging from position-based matrices and decision trees to more flexible machine-learning methods such as hidden Markov models and Support Vector Machines. In addition to a technical overview for each family of algorithms, we tried to provide elements of evaluation and performance comparison across methods.


Asunto(s)
Cómputos Matemáticos , Algoritmos , Sitios de Unión , Calpaína/metabolismo , Caspasas/metabolismo , Cadenas de Markov , Proteolisis , Máquina de Vectores de Soporte
5.
Bioinformatics ; 29(23): 3053-9, 2013 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-24037215

RESUMEN

MOTIVATION: Although several methods exist to relate high-dimensional gene expression data to various clinical phenotypes, finding combinations of features in such input remains a challenge, particularly when fitting complex statistical models such as those used for survival studies. RESULTS: Our proposed method builds on existing 'regularization path-following' techniques to produce regression models that can extract arbitrarily complex patterns of input features (such as gene combinations) from large-scale data that relate to a known clinical outcome. Through the use of the data's structure and itemset mining techniques, we are able to avoid combinatorial complexity issues typically encountered with such methods, and our algorithm performs in similar orders of duration as single-variable versions. Applied to data from various clinical studies of cancer patient survival time, our method was able to produce a number of promising gene-interaction candidates whose tumour-related roles appear confirmed by literature.


Asunto(s)
Neoplasias de la Mama/mortalidad , Biología Computacional/métodos , Redes Reguladoras de Genes , Proteínas de Neoplasias/genética , Neuroblastoma/mortalidad , Algoritmos , Neoplasias de la Mama/genética , Femenino , Perfilación de la Expresión Génica , Humanos , Funciones de Verosimilitud , Modelos Logísticos , Modelos Biológicos , Neuroblastoma/genética , Modelos de Riesgos Proporcionales , Factores de Riesgo , Tasa de Supervivencia
6.
Stem Cell Reports ; 16(4): 954-967, 2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33711267

RESUMEN

Metastasis is the major cause of cancer-related death, but whether metastatic lesions exhibit the same cellular composition as primary tumors has yet to be elucidated. To investigate the cellular heterogeneity of metastatic colorectal cancer (CRC), we established 72 patient-derived organoids (PDOs) from 21 patients. Combined bulk transcriptomic and single-cell RNA-sequencing analysis revealed decreased gene expression of markers for differentiated cells in PDOs derived from metastatic lesions. Paradoxically, expression of potential intestinal stem cell markers was also decreased. We identified OLFM4 as the gene most strongly correlating with a stem-like cell cluster, and found OLFM4+ cells to be capable of initiating organoid culture growth and differentiation capacity in primary PDOs. These cells were required for the efficient growth of primary PDOs but dispensable for metastatic PDOs. These observations demonstrate that metastatic lesions have a cellular composition distinct from that of primary tumors; patient-matched PDOs are a useful resource for analyzing metastatic CRC.


Asunto(s)
Neoplasias Colorrectales/patología , Factor Estimulante de Colonias de Granulocitos/metabolismo , Organoides/metabolismo , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/cirugía , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Metástasis de la Neoplasia , Organoides/patología
7.
Genome Inform ; 22: 202-13, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20238430

RESUMEN

While the importance of modulatory proteolysis in research has steadily increased, knowledge on this process has remained largely disorganized, with the nature and role of entities composing modulatory proteolysis still uncertain. We built CaMPDB, a resource on modulatory proteolysis, with a focus on calpain, a well-studied intracellular protease which regulates substrate functions by proteolytic processing. CaMPDB contains sequences of calpains, substrates and inhibitors as well as substrate cleavage sites, collected from the literature. Some cleavage efficiencies were evaluated by biochemical experiments and a cleavage site prediction tool is provided to assist biologists in understanding calpain-mediated cellular processes. CaMPDB is freely accessible at http://calpain.org.


Asunto(s)
Calpaína/metabolismo , Cadenas de Markov , Animales , Teorema de Bayes , Sitios de Unión , Humanos , Hidrólisis , Unión Proteica
8.
Methods Mol Biol ; 1915: 121-147, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30617801

RESUMEN

Calpain, an intracellular Ca2+-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown.Current sequencing technologies have made it possible to compile large amounts of cleavage data and brought greater understanding of the underlying protein interactions. However, the practical impossibility of exhaustively retrieving substrate sequences through experimentation alone has created the need for efficient computational prediction methods. Such methods must be able to quickly mark substrate candidates and putative cleavage sites for further analysis. While many methods exist for both calpain and other types of proteolytic actions, the expected reliability of these methods depends heavily on the type and complexity of proteolytic action, as well as the availability of well-labeled experimental datasets, which both vary greatly across enzyme families.This chapter introduces CalCleaveMKL: a tool for calpain cleavage prediction based on multiple kernel learning, an extension to the classic support vector machine framework that is able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality. Along with its improved accuracy, the method used by CalCleaveMKL provided numerous insights on the respective importance of sequence-related features, such as solvent accessibility and secondary structure. It notably demonstrated there existed significant specificity differences across calpain subtypes, despite previous assumption to the contrary.An online implementation of this prediction tool is available at http://calpain.org .


Asunto(s)
Calpaína/química , Biología Computacional/métodos , Programas Informáticos , Algoritmos , Sitios de Unión , Caspasas/química , Estructura Secundaria de Proteína , Proteolisis , Especificidad por Sustrato , Máquina de Vectores de Soporte
9.
Methods Mol Biol ; 1807: 95-111, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30030806

RESUMEN

Biclustering extracts coexpressed genes under certain experimental conditions, providing more precise insight into the genetic behaviors than one-dimensional clustering. For understanding the biological features of genes in a single bicluster, visualizations such as heatmaps or parallel coordinate plots and tools for enrichment analysis are widely used. However, simultaneously handling many biclusters still remains a challenge. Thus, we developed a web service named SiBIC, which, using maximal frequent itemset mining, exhaustively discovers significant biclusters, which turn into networks of overlapping biclusters, where nodes are gene sets and edges show their overlaps in the detected biclusters. SiBIC provides a graphical user interface for manipulating a gene set network, where users can find target gene sets based on the enriched network. This chapter provides a user guide/instruction of SiBIC with background of having developed this software. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/sibic/faces/index.jsp .


Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Análisis por Conglomerados , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Internet
10.
PLoS One ; 6(5): e19035, 2011 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-21559271

RESUMEN

Calpain, an intracellular Ca²âº-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org.


Asunto(s)
Calpaína/química , Algoritmos , Área Bajo la Curva , Inteligencia Artificial , Sitios de Unión , Bioquímica/métodos , Calcio/química , Proteínas de Unión al Calcio/química , Biología Computacional/métodos , Humanos , Modelos Estadísticos , Distribución Normal , Unión Proteica , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Reproducibilidad de los Resultados , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA