Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Nature ; 488(7409): 49-56, 2012 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-22832581

RESUMO

Medulloblastoma, the most common malignant paediatric brain tumour, is currently treated with nonspecific cytotoxic therapies including surgery, whole-brain radiation, and aggressive chemotherapy. As medulloblastoma exhibits marked intertumoural heterogeneity, with at least four distinct molecular variants, previous attempts to identify targets for therapy have been underpowered because of small samples sizes. Here we report somatic copy number aberrations (SCNAs) in 1,087 unique medulloblastomas. SCNAs are common in medulloblastoma, and are predominantly subgroup-enriched. The most common region of focal copy number gain is a tandem duplication of SNCAIP, a gene associated with Parkinson's disease, which is exquisitely restricted to Group 4α. Recurrent translocations of PVT1, including PVT1-MYC and PVT1-NDRG1, that arise through chromothripsis are restricted to Group 3. Numerous targetable SCNAs, including recurrent events targeting TGF-ß signalling in Group 3, and NF-κB signalling in Group 4, suggest future avenues for rational, targeted therapy.


Assuntos
Neoplasias Cerebelares/classificação , Neoplasias Cerebelares/genética , Genoma Humano/genética , Variação Estrutural do Genoma/genética , Meduloblastoma/classificação , Meduloblastoma/genética , Proteínas de Transporte/genética , Neoplasias Cerebelares/metabolismo , Criança , Variações do Número de Cópias de DNA/genética , Duplicação Gênica/genética , Genes myc/genética , Genômica , Proteínas Hedgehog/metabolismo , Humanos , Meduloblastoma/metabolismo , NF-kappa B/metabolismo , Proteínas do Tecido Nervoso/genética , Proteínas de Fusão Oncogênica/genética , Proteínas/genética , RNA Longo não Codificante , Transdução de Sinais , Fator de Crescimento Transformador beta/metabolismo , Translocação Genética/genética
2.
BMC Bioinformatics ; 14: 304, 2013 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-24112406

RESUMO

BACKGROUND: Since membrane protein structures are challenging to crystallize, computational approaches are essential for elucidating the sequence-to-structure relationships. Structural modeling of membrane proteins requires a multidimensional approach, and one critical geometric parameter is the rotational angle of transmembrane helices. Rotational angles of transmembrane helices are characterized by their folded structures and could be inferred by the hydrophobic moment; however, the folding mechanism of membrane proteins is not yet fully understood. The rotational angle of a transmembrane helix is related to the exposed surface of a transmembrane helix, since lipid exposure gives the degree of accessibility of each residue in lipid environment. To the best of our knowledge, there have been few advances in investigating whether an environment descriptor of lipid exposure could infer a geometric parameter of rotational angle. RESULTS: Here, we present an analysis of the relationship between rotational angles and lipid exposure and a support-vector-machine method, called TMexpo, for predicting both structural features from sequences. First, we observed from the development set of 89 protein chains that the lipid exposure, i.e., the relative accessible surface area (rASA) of residues in the lipid environment, generated from high-resolution protein structures could infer the rotational angles with a mean absolute angular error (MAAE) of 46.32˚. More importantly, the predicted rASA from TMexpo achieved an MAAE of 51.05˚, which is better than 71.47˚ obtained by the best of the compared hydrophobicity scales. Lastly, TMexpo outperformed the compared methods in rASA prediction on the independent test set of 21 protein chains and achieved an overall Matthew's correlation coefficient, accuracy, sensitivity, specificity, and precision of 0.51, 75.26%, 81.30%, 69.15%, and 72.73%, respectively. TMexpo is publicly available at http://bio-cluster.iis.sinica.edu.tw/TMexpo. CONCLUSIONS: TMexpo can better predict rASA and rotational angles than the compared methods. When rotational angles can be accurately predicted, free modeling of transmembrane protein structures in turn may benefit from a reduced complexity in ensembles with a significantly less number of packing arrangements. Furthermore, sequence-based prediction of both rotational angle and lipid exposure can provide essential information when high-resolution structures are unavailable and contribute to experimental design to elucidate transmembrane protein functions.


Assuntos
Biologia Computacional/métodos , Lipídeos de Membrana/química , Proteínas de Membrana/química , Sequência de Aminoácidos , Interações Hidrofóbicas e Hidrofílicas , Lipídeos de Membrana/metabolismo , Proteínas de Membrana/metabolismo , Dados de Sequência Molecular , Estrutura Secundária de Proteína , Máquina de Vetores de Suporte
3.
Nucleic Acids Res ; 39(Database issue): D347-55, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21177659

RESUMO

α-helical transmembrane (TM) proteins play an important role in many critical and diverse biological processes, and specific associations between TM helices are important determinants for membrane protein folding, dynamics and function. In order to gain insights into the above phenomena, it is necessary to investigate different types of helix-packing modes and interactions. However, such information is difficult to obtain because of the experimental impediment and a lack of a well-annotated source of helix-packing folds in TM proteins. We have developed the TMPad (TransMembrane Protein Helix-Packing Database) which addresses the above issues by integrating experimentally observed helix-helix interactions and related structural information of membrane proteins. Specifically, the TMPad offers pre-calculated geometric descriptors at the helix-packing interface including residue backbone/side-chain contacts, interhelical distances and crossing angles, helical translational shifts and rotational angles. The TMPad also includes the corresponding sequence, topology, lipid accessibility, ligand-binding information and supports structural classification, schematic diagrams and visualization of the above structural features of TM helix-packing. Through detailed annotations and visualizations of helix-packing, this online resource can serve as an information gateway for deciphering the relationship between helix-helix interactions and higher levels of organization in TM protein structure and function. The website of the TMPad is freely accessible to the public at http://bio-cluster.iis.sinica.edu.tw/TMPad.


Assuntos
Bases de Dados de Proteínas , Proteínas de Membrana/química , Sítios de Ligação , Ligantes , Lipídeos/química , Proteínas de Membrana/metabolismo , Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína , Interface Usuário-Computador
4.
Bioinformatics ; 25(8): 996-1003, 2009 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-19244388

RESUMO

MOTIVATION: Helix-helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction. RESULTS: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins.


Assuntos
Biologia Computacional/métodos , Proteínas de Membrana/química , Bases de Dados de Proteínas , Proteínas de Membrana/metabolismo , Modelos Biológicos , Estrutura Secundária de Proteína , Reprodutibilidade dos Testes
5.
Proteins ; 72(2): 693-710, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18260102

RESUMO

Prediction of protein subcellular localization (PSL) is important for genome annotation, protein function prediction, and drug discovery. Many computational approaches for PSL prediction based on protein sequences have been proposed in recent years for Gram-negative bacteria. We present PSLDoc, a method based on gapped-dipeptides and probabilistic latent semantic analysis (PLSA) to solve this problem. A protein is considered as a term string composed by gapped-dipeptides, which are defined as any two residues separated by one or more positions. The weighting scheme of gapped-dipeptides is calculated according to a position specific score matrix, which includes sequence evolutionary information. Then, PLSA is applied for feature reduction, and reduced vectors are input to five one-versus-rest support vector machine classifiers. The localization site with the highest probability is assigned as the final prediction. It has been reported that there is a strong correlation between sequence homology and subcellular localization (Nair and Rost, Protein Sci 2002;11:2836-2847; Yu et al., Proteins 2006;64:643-651). To properly evaluate the performance of PSLDoc, a target protein can be classified into low- or high-homology data sets. PSLDoc's overall accuracy of low- and high-homology data sets reaches 86.84% and 98.21%, respectively, and it compares favorably with that of CELLO II (Yu et al., Proteins 2006;64:643-651). In addition, we set a confidence threshold to achieve a high precision at specified levels of recall rates. When the confidence threshold is set at 0.7, PSLDoc achieves 97.89% in precision which is considerably better than that of PSORTb v.2.0 (Gardy et al., Bioinformatics 2005;21:617-623). Our approach demonstrates that the specific feature representation for proteins can be successfully applied to the prediction of protein subcellular localization and improves prediction accuracy. Besides, because of the generality of the representation, our method can be extended to eukaryotic proteomes in the future. The web server of PSLDoc is publicly available at http://bio-cluster.iis.sinica.edu.tw/~ bioapp/PSLDoc/.


Assuntos
Dipeptídeos/metabolismo , Proteínas/metabolismo , Frações Subcelulares/metabolismo , Probabilidade , Proteínas/química
6.
BMC Bioinformatics ; 8: 330, 2007 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-17825110

RESUMO

BACKGROUND: Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. RESULTS: We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%. CONCLUSION: Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes.


Assuntos
Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Perfilação da Expressão Gênica/métodos , Bactérias Gram-Negativas/metabolismo , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Frações Subcelulares/metabolismo , Sequência de Aminoácidos , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade
7.
Nat Genet ; 45(3): 279-84, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23334666

RESUMO

Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 affected individuals (cases) using a combination of whole-exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per Mb (0.48 nonsilent) and notably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, and an additional 7.1% had focal deletions), MYCN (1.7%, causing a recurrent p.Pro44Leu alteration) and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1 and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies that rely on frequently altered oncogenic drivers.


Assuntos
Exoma , Mutação , Neuroblastoma , Linhagem Celular Tumoral , Predisposição Genética para Doença , Genoma Humano , Humanos , Neuroblastoma/genética , Neuroblastoma/fisiopatologia , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Transcriptoma
8.
J Proteome Res ; 7(2): 487-96, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18081245

RESUMO

The prediction of transmembrane (TM) helix and topology provides important information about the structure and function of a membrane protein. Due to the experimental difficulties in obtaining a high-resolution model, computational methods are highly desirable. In this paper, we present a hierarchical classification method using support vector machines (SVMs) that integrates selected features by capturing the sequence-to-structure relationship and developing a new scoring function based on membrane protein folding. The proposed approach is evaluated on low- and high-resolution data sets with cross-validation, and the topology (sidedness) prediction accuracy reaches as high as 90%. Our method is also found to correctly predict both the location of TM helices and the topology for 69% of the low-resolution benchmark set. We also test our method for discrimination between soluble and membrane proteins and achieve very low overall false positive (0.5%) and false negative rates (0 to approximately 1.2%). Lastly, the analysis of the scoring function suggests that the topogeneses of single-spanning and multispanning TM proteins have different levels of complexity, and the consideration of interloop topogenic interactions for the latter is the key to achieving better predictions. This method can facilitate the annotation of membrane proteomes to extract useful structural and functional information. It is publicly available at http://bio-cluster.iis.sinica.edu.tw/~bioapp/SVMtop.


Assuntos
Biologia Computacional , Proteínas de Membrana/química , Proteínas de Membrana/classificação , Análise de Sequência de Proteína , Sequência de Aminoácidos , Proteínas de Membrana/genética , Dados de Sequência Molecular , Valor Preditivo dos Testes , Estrutura Secundária de Proteína , Reprodutibilidade dos Testes , Solubilidade
9.
Artigo em Inglês | MEDLINE | ID: mdl-17369623

RESUMO

MOTIVATION: A key class of membrane proteins contains one or more transmembrane (TM) helices, traversing the membrane lipid bilayer. Various properties such as the length, arrangement and topology or orientation of TM helices, are closely related to a protein's functions. Although a range of methods have been developed to predict TM helices and their topologies, no single method consistently outperforms the others. In addition, topology prediction has much lower accuracy than helix prediction, and thus requires continuous improvements. RESULTS: We develop a method based on support vector machines (SVM) in a hierarchical framework to predict TM helices first, followed by their topology. By partitioning the prediction problem into two steps, specific input features can be selected and integrated in each step. We also propose a novel scoring function for topology models based on membrane protein folding process. When benchmarked against other methods in terms of performance, our approach achieves the highest scores at 86% in helix prediction (Q(2)) and 91% in topology prediction (TOPO) for the high-resolution data set, resulting in an improvement of 6% and 14% in their respective categories over the second best method. Furthermore, we demonstrate the ability of our method to discriminate between membrane and non-membrane proteins, with higher than 99% in accuracy. When tested on a small set of newly solved structures of membrane proteins, our method overcomes some of the difficulties in predicting TM helices by incorporating multiple biological input features.


Assuntos
Biologia Computacional/métodos , Estrutura Secundária de Proteína , Proteínas/química , Proteômica/métodos , Algoritmos , Bacteriorodopsinas/química , Membrana Celular , Proteínas de Membrana/química , Modelos Estatísticos , Modelos Teóricos , Conformação Proteica , Dobramento de Proteína , Análise de Sequência de Proteína , Software
10.
Artigo em Inglês | MEDLINE | ID: mdl-17369650

RESUMO

Prediction of subcellular localization of proteins is important for genome annotation, protein function prediction, and drug discovery. We present a prediction method for Gram-negative bacteria that uses ten one-versus-one support vector machine (SVM) classifiers, where compartment-specific biological features are selected as input to each SVM classifier. The final prediction of localization sites is determined by integrating the results from ten binary classifiers using a combination of majority votes and a probabilistic method. The overall accuracy reaches 91.4%, which is 1.6% better than the state-of-the-art system, in a ten-fold cross-validation evaluation on a benchmark data set. We demonstrate that feature selection guided by biological knowledge and insights in one-versus-one SVM classifiers can lead to a significant improvement in the prediction performance. Our model is also used to produce highly accurate prediction of 92.8% overall accuracy for proteins of dual localizations.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteômica/métodos , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Bactérias Gram-Negativas/metabolismo , Dados de Sequência Molecular , Peptídeos/química , Probabilidade , Sinais Direcionadores de Proteínas , Estrutura Secundária de Proteína , Reprodutibilidade dos Testes , Software , Solventes/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA