Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
1.
Proteins ; 88(11): 1472-1481, 2020 11.
Article in English | MEDLINE | ID: mdl-32535960

ABSTRACT

Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence-only prediction method-which tries to overcome the challenge of accurate prediction posed by IDRs-based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3-letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state-of-the-art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome-wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.


Subject(s)
Computational Biology/methods , Data Mining/statistics & numerical data , Intrinsically Disordered Proteins/chemistry , Machine Learning , Neural Networks, Computer , Amino Acid Sequence , Area Under Curve , Benchmarking , Datasets as Topic , Humans , Multifactor Dimensionality Reduction , ROC Curve , Sequence Analysis, Protein
2.
J Proteome Res ; 14(6): 2707-13, 2015 Jun 05.
Article in English | MEDLINE | ID: mdl-25873244

ABSTRACT

The Clinical Proteomic Tumor Analysis Consortium (CPTAC), under the auspices of the National Cancer Institute's Office of Cancer Clinical Proteomics Research, is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to clinical tumor samples with characterized genomic and transcript profiles. The consortium analyzes cancer biospecimens using mass spectrometry, identifying and quantifying the constituent proteins and characterizing each tumor sample's proteome. Mass spectrometry enables highly specific identification of proteins and their isoforms, accurate relative quantitation of protein abundance in contrasting biospecimens, and localization of post-translational protein modifications, such as phosphorylation, on a protein's sequence. The combination of proteomics, transcriptomics, and genomics data from the same clinical tumor samples provides an unprecedented opportunity for tumor proteogenomics. The CPTAC Data Portal is the centralized data repository for the dissemination of proteomic data collected by Proteome Characterization Centers (PCCs) in the consortium. The portal currently hosts 6.3 TB of data and includes proteomic investigations of breast, colorectal, and ovarian tumor tissues from The Cancer Genome Atlas (TCGA). The data collected by the consortium is made freely available to the public through the data portal.


Subject(s)
Biomedical Research , Databases, Protein , Neoplasm Proteins , Proteomics , Humans , Information Storage and Retrieval , Neoplasm Proteins/metabolism , Neoplasms/genetics , Neoplasms/metabolism
3.
Cancer Epidemiol Biomarkers Prev ; 29(5): 927-935, 2020 05.
Article in English | MEDLINE | ID: mdl-32156722

ABSTRACT

BACKGROUND: The success of multisite collaborative research relies on effective data collection, harmonization, and aggregation strategies. Data Coordination Centers (DCC) serve to facilitate the implementation of these strategies. The utility of a DCC can be particularly relevant for research on rare diseases where collaboration from multiple sites to amass large aggregate datasets is essential. However, approaches to building a DCC have been scarcely documented. METHODS: The Li-Fraumeni Exploration (LiFE) Consortium's DCC was created using multiple open source packages, including LAM/G Application (Linux, Apache, MySQL, Grails), Extraction-Transformation-Loading (ETL) Pentaho Data Integration Tool, and the Saiku-Mondrian client. This document serves as a resource for building a rare disease DCC for multi-institutional collaborative research. RESULTS: The primary scientific and technological objective to create an online central repository into which data from all participating sites could be deposited, harmonized, aggregated, disseminated, and analyzed was completed. The cohort now include 2,193 participants from six contributing sites, including 1,354 individuals from families with a pathogenic or likely variant in TP53. Data on cancer diagnoses are also available. Challenges and lessons learned are summarized. CONCLUSIONS: The methods leveraged mitigate challenges associated with successfully developing a DCC's technical infrastructure, data harmonization efforts, communications, and software development and applications. IMPACT: These methods can serve as a framework in establishing other collaborative research efforts. Data from the consortium will serve as a great resource for collaborative research to improve knowledge on, and the ability to care for, individuals and families with Li-Fraumeni syndrome.


Subject(s)
Health Information Exchange , International Cooperation , Li-Fraumeni Syndrome/epidemiology , Rare Diseases/epidemiology , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , Cohort Studies , Data Collection/methods , Female , Genetic Predisposition to Disease , Germ-Line Mutation , Global Burden of Disease , Humans , Infant , Infant, Newborn , Internet , Li-Fraumeni Syndrome/genetics , Male , Middle Aged , Rare Diseases/genetics , Sample Size , Tumor Suppressor Protein p53/genetics , Young Adult
4.
J Am Med Inform Assoc ; 19(e1): e125-8, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22323393

ABSTRACT

Quality control and harmonization of data is a vital and challenging undertaking for any successful data coordination center and a responsibility shared between the multiple sites that produce, integrate, and utilize the data. Here we describe a coordinated effort between scientists and data managers in the Cancer Family Registries to implement a data governance infrastructure consisting of both organizational and technical solutions. The technical solution uses a rule-based validation system that facilitates error detection and correction for data centers submitting data to a central informatics database. Validation rules comprise both standard checks on allowable values and a crosscheck of related database elements for logical and scientific consistency. Evaluation over a 2-year timeframe showed a significant decrease in the number of errors in the database and a concurrent increase in data consistency and accuracy.


Subject(s)
Breast Neoplasms , Colonic Neoplasms , Databases, Factual/standards , Registries/standards , Breast Neoplasms/epidemiology , Colonic Neoplasms/epidemiology , Databases, Factual/statistics & numerical data , Humans , Quality Control , Research Design , United States
SELECTION OF CITATIONS
SEARCH DETAIL