Búsqueda | Portal Regional de la BVS

cnnAlpha: Protein disordered regions prediction by reduced amino acid alphabets and convolutional neural networks.

Oberti, Mauricio; Vaisman, Iosif I.

Proteins ; 88(11): 1472-1481, 2020 11.

Artículo en Inglés | MEDLINE | ID: mdl-32535960

RESUMEN

Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence-only prediction method-which tries to overcome the challenge of accurate prediction posed by IDRs-based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3-letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state-of-the-art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome-wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.

Asunto(s)

Biología Computacional/métodos , Minería de Datos/estadística & datos numéricos , Proteínas Intrínsecamente Desordenadas/química , Aprendizaje Automático , Redes Neurales de la Computación , Secuencia de Aminoácidos , Área Bajo la Curva , Benchmarking , Conjuntos de Datos como Asunto , Humanos , Reducción de Dimensionalidad Multifactorial , Curva ROC , Análisis de Secuencia de Proteína

Li-Fraumeni Exploration Consortium Data Coordinating Center: Building an Interactive Web-Based Resource for Collaborative International Cancer Epidemiology Research for a Rare Condition.

Mai, Phuong L; Sand, Sharon R; Saha, Neiladri; Oberti, Mauricio; Dolafi, Tom; DiGianni, Lisa; Root, Elizabeth J; Kong, Xianhua; Bremer, Renee C; Santiago, Karina M; Bojadzieva, Jasmina; Barley, Derek; Novokmet, Ana; Ketchum, Karen A; Nguyen, Ngoc; Jacob, Shine; Nichols, Kim E; Kratz, Christian P; Schiffman, Joshua D; Evans, D Gareth; Achatz, Maria Isabel; Strong, Louise C; Garber, Judy E; Ladwa, Sweta A; Malkin, David; Weitzel, Jeffrey N.

Cancer Epidemiol Biomarkers Prev ; 29(5): 927-935, 2020 05.

Artículo en Inglés | MEDLINE | ID: mdl-32156722

RESUMEN

BACKGROUND: The success of multisite collaborative research relies on effective data collection, harmonization, and aggregation strategies. Data Coordination Centers (DCC) serve to facilitate the implementation of these strategies. The utility of a DCC can be particularly relevant for research on rare diseases where collaboration from multiple sites to amass large aggregate datasets is essential. However, approaches to building a DCC have been scarcely documented. METHODS: The Li-Fraumeni Exploration (LiFE) Consortium's DCC was created using multiple open source packages, including LAM/G Application (Linux, Apache, MySQL, Grails), Extraction-Transformation-Loading (ETL) Pentaho Data Integration Tool, and the Saiku-Mondrian client. This document serves as a resource for building a rare disease DCC for multi-institutional collaborative research. RESULTS: The primary scientific and technological objective to create an online central repository into which data from all participating sites could be deposited, harmonized, aggregated, disseminated, and analyzed was completed. The cohort now include 2,193 participants from six contributing sites, including 1,354 individuals from families with a pathogenic or likely variant in TP53. Data on cancer diagnoses are also available. Challenges and lessons learned are summarized. CONCLUSIONS: The methods leveraged mitigate challenges associated with successfully developing a DCC's technical infrastructure, data harmonization efforts, communications, and software development and applications. IMPACT: These methods can serve as a framework in establishing other collaborative research efforts. Data from the consortium will serve as a great resource for collaborative research to improve knowledge on, and the ability to care for, individuals and families with Li-Fraumeni syndrome.

Asunto(s)

Intercambio de Información en Salud , Cooperación Internacional , Síndrome de Li-Fraumeni/epidemiología , Enfermedades Raras/epidemiología , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Preescolar , Estudios de Cohortes , Recolección de Datos/métodos , Femenino , Predisposición Genética a la Enfermedad , Mutación de Línea Germinal , Carga Global de Enfermedades , Humanos , Lactante , Recién Nacido , Internet , Síndrome de Li-Fraumeni/genética , Masculino , Persona de Mediana Edad , Enfermedades Raras/genética , Tamaño de la Muestra , Proteína p53 Supresora de Tumor/genética , Adulto Joven

The CPTAC Data Portal: A Resource for Cancer Proteomics Research.

Edwards, Nathan J; Oberti, Mauricio; Thangudu, Ratna R; Cai, Shuang; McGarvey, Peter B; Jacob, Shine; Madhavan, Subha; Ketchum, Karen A.

J Proteome Res ; 14(6): 2707-13, 2015 Jun 05.

Artículo en Inglés | MEDLINE | ID: mdl-25873244

RESUMEN

The Clinical Proteomic Tumor Analysis Consortium (CPTAC), under the auspices of the National Cancer Institute's Office of Cancer Clinical Proteomics Research, is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to clinical tumor samples with characterized genomic and transcript profiles. The consortium analyzes cancer biospecimens using mass spectrometry, identifying and quantifying the constituent proteins and characterizing each tumor sample's proteome. Mass spectrometry enables highly specific identification of proteins and their isoforms, accurate relative quantitation of protein abundance in contrasting biospecimens, and localization of post-translational protein modifications, such as phosphorylation, on a protein's sequence. The combination of proteomics, transcriptomics, and genomics data from the same clinical tumor samples provides an unprecedented opportunity for tumor proteogenomics. The CPTAC Data Portal is the centralized data repository for the dissemination of proteomic data collected by Proteome Characterization Centers (PCCs) in the consortium. The portal currently hosts 6.3 TB of data and includes proteomic investigations of breast, colorectal, and ovarian tumor tissues from The Cancer Genome Atlas (TCGA). The data collected by the consortium is made freely available to the public through the data portal.

Asunto(s)

Investigación Biomédica , Bases de Datos de Proteínas , Proteínas de Neoplasias , Proteómica , Humanos , Almacenamiento y Recuperación de la Información , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/metabolismo

Informatics and data quality at collaborative multicenter Breast and Colon Cancer Family Registries.

McGarvey, Peter B; Ladwa, Sweta; Oberti, Mauricio; Dragomir, Anca Dana; Hedlund, Erin K; Tanenbaum, David Michael; Suzek, Baris E; Madhavan, Subha.

J Am Med Inform Assoc ; 19(e1): e125-8, 2012 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-22323393

RESUMEN

Quality control and harmonization of data is a vital and challenging undertaking for any successful data coordination center and a responsibility shared between the multiple sites that produce, integrate, and utilize the data. Here we describe a coordinated effort between scientists and data managers in the Cancer Family Registries to implement a data governance infrastructure consisting of both organizational and technical solutions. The technical solution uses a rule-based validation system that facilitates error detection and correction for data centers submitting data to a central informatics database. Validation rules comprise both standard checks on allowable values and a crosscheck of related database elements for logical and scientific consistency. Evaluation over a 2-year timeframe showed a significant decrease in the number of errors in the database and a concurrent increase in data consistency and accuracy.

Asunto(s)

Neoplasias de la Mama , Neoplasias del Colon , Bases de Datos Factuales/normas , Sistema de Registros/normas , Neoplasias de la Mama/epidemiología , Neoplasias del Colon/epidemiología , Bases de Datos Factuales/estadística & datos numéricos , Humanos , Control de Calidad , Proyectos de Investigación , Estados Unidos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA