RESUMO
Subclonal reconstruction algorithms use bulk DNA sequencing data to quantify parameters of tumor evolution, allowing an assessment of how cancers initiate, progress and respond to selective pressures. We launched the ICGC-TCGA (International Cancer Genome Consortium-The Cancer Genome Atlas) DREAM Somatic Mutation Calling Tumor Heterogeneity and Evolution Challenge to benchmark existing subclonal reconstruction algorithms. This 7-year community effort used cloud computing to benchmark 31 subclonal reconstruction algorithms on 51 simulated tumors. Algorithms were scored on seven independent tasks, leading to 12,061 total runs. Algorithm choice influenced performance substantially more than tumor features but purity-adjusted read depth, copy-number state and read mappability were associated with the performance of most algorithms on most tasks. No single algorithm was a top performer for all seven tasks and existing ensemble strategies were unable to outperform the best individual methods, highlighting a key research need. All containerized methods, evaluation code and datasets are available to support further assessment of the determinants of subclonal reconstruction accuracy and development of improved methods to understand tumor evolution.
RESUMO
Spinocerebellar ataxia Type 3 (SCA3), the most common dominantly inherited ataxia, is a polyglutamine neurodegenerative disease for which there is no disease-modifying therapy. The polyglutamine-encoding CAG repeat expansion in the ATXN3 gene results in expression of a mutant form of the ATXN3 protein, a deubiquitinase that causes selective neurodegeneration despite being widely expressed. The mechanisms driving neurodegeneration in SCA3 are unclear. Research to date, however, has focused almost exclusively on neurons. Here, using equal male and female age-matched transgenic mice expressing full-length human mutant ATXN3, we identified early and robust transcriptional changes in selectively vulnerable brain regions that implicate oligodendrocytes in disease pathogenesis. We mapped transcriptional changes across early, mid, and late stages of disease in two selectively vulnerable brain regions: the cerebellum and brainstem. The most significant disease-associated module through weighted gene coexpression network analysis revealed dysfunction in SCA3 oligodendrocyte maturation. These results reflect a toxic gain-of-function mechanism, as ATXN3 KO mice do not exhibit any impairments in oligodendrocyte maturation. Genetic crosses to reporter mice revealed a marked reduction in mature oligodendrocytes in SCA3-disease vulnerable brain regions, and ultrastructural microscopy confirmed abnormalities in axonal myelination. Further study of isolated oligodendrocyte precursor cells from SCA3 mice established that this impairment in oligodendrocyte maturation is a cell-autonomous process. We conclude that SCA3 is not simply a disease of neurons, and the search for therapeutic strategies and disease biomarkers will need to account for non-neuronal involvement in SCA3 pathogenesis.SIGNIFICANCE STATEMENT Despite advances in spinocerebellar ataxia Type 3 (SCA3) disease understanding, much remains unknown about how the disease gene causes brain dysfunction ultimately leading to cell death. We completed a longitudinal transcriptomic analysis of vulnerable brain regions in SCA3 mice to define the earliest and most robust changes across disease progression. Through gene network analyses followed up with biochemical and histologic studies in SCA3 mice, we provide evidence for severe dysfunction in oligodendrocyte maturation early in SCA3 pathogenesis. Our results advance understanding of SCA3 disease mechanisms, identify additional routes for therapeutic intervention, and may provide broader insight into polyglutamine diseases beyond SCA3.
Assuntos
Doença de Machado-Joseph , Doenças Neurodegenerativas , Oligodendroglia , Animais , Ataxina-3/genética , Ataxina-3/metabolismo , Feminino , Doença de Machado-Joseph/genética , Doença de Machado-Joseph/metabolismo , Doença de Machado-Joseph/patologia , Masculino , Camundongos , Camundongos Transgênicos , Doenças Neurodegenerativas/metabolismo , Oligodendroglia/metabolismo , Oligodendroglia/patologiaRESUMO
Spinocerebellar ataxia type 3 (SCA3) is the second-most common CAG repeat disease, caused by a glutamine-encoding expansion in the ATXN3 protein. SCA3 is characterized by spinocerebellar degeneration leading to progressive motor incoordination and early death. Previous studies suggest that potassium channel dysfunction underlies early abnormalities in cerebellar cortical Purkinje neuron firing in SCA3. However, cerebellar cortical degeneration is often modest both in the human disease and mouse models of SCA3, raising uncertainty about the role of cerebellar dysfunction in SCA3. Here, we address this question by investigating Purkinje neuron excitability in SCA3. In early-stage SCA3 mice, we confirm a previously identified increase in excitability of cerebellar Purkinje neurons and associate this excitability with reduced transcripts of two voltage-gated potassium (KV) channels, Kcna6 and Kcnc3, as well as motor impairment. Intracerebroventricular delivery of antisense oligonucleotides (ASO) to reduce mutant ATXN3 restores normal excitability to SCA3 Purkinje neurons and rescues transcript levels of Kcna6 and Kcnc3. Interestingly, while an even broader range of KV channel transcripts shows reduced levels in late-stage SCA3 mice, cerebellar Purkinje neuron physiology was not further altered despite continued worsening of motor impairment. These results suggest the progressive motor phenotype observed in SCA3 may not reflect ongoing changes in the cerebellar cortex but instead dysfunction of other neuronal structures within and beyond the cerebellum. Nevertheless, the early rescue of both KV channel expression and neuronal excitability by ASO treatment suggests that cerebellar cortical dysfunction contributes meaningfully to motor dysfunction in SCA3.
Assuntos
Ataxina-3/genética , Doença de Machado-Joseph/tratamento farmacológico , Doença de Machado-Joseph/genética , Oligonucleotídeos Antissenso/uso terapêutico , Células de Purkinje/patologia , Proteínas Repressoras/genética , Animais , Comportamento Animal , Humanos , Injeções Intraventriculares , Canal de Potássio Kv1.6/efeitos dos fármacos , Canal de Potássio Kv1.6/genética , Doença de Machado-Joseph/psicologia , Camundongos , Camundongos Transgênicos , Técnicas de Patch-Clamp , Fenótipo , Canais de Potássio de Abertura Dependente da Tensão da Membrana/efeitos dos fármacos , Canais de Potássio Shaw/efeitos dos fármacos , Canais de Potássio Shaw/genética , Resultado do TratamentoRESUMO
Dissecting tumor heterogeneity is a key to understanding the complex mechanisms underlying drug resistance in cancers. The rich literature of pioneering studies on tumor heterogeneity analysis spurred a recent community-wide benchmark study that compares diverse modeling algorithms. Here we present FastClone, a top-performing algorithm in accuracy in this benchmark. FastClone improves over existing methods by allowing the deconvolution of subclones that have independent copy number variation events within the same chromosome regions. We characterize the behavior of FastClone in identifying subclones using stage III colon cancer primary tumor samples as well as simulated data. It achieves approximately 100-fold acceleration in computation for both simulated and patient data. The efficacy of FastClone will allow its application to large-scale data and clinical data, and facilitate personalized medicine in cancers.
Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Neoplasias/genética , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Biologia Computacional/métodos , Simulação por Computador , DNA de Neoplasias/genética , Resistencia a Medicamentos Antineoplásicos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Genéticos , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Filogenia , Medicina de Precisão , Análise de Sequência de DNARESUMO
Tumor heterogeneity is generated through a combination of genetic and epigenetic mechanisms, the latter of which plays an important role in the generation of stem like cells responsible for tumor formation and metastasis. Although the development of single cell transcriptomic technologies holds promise to deconvolute this complexity, a number of these techniques have limitations including drop-out and uneven coverage, which challenge the further delineation of tumor heterogeneity. We adopted deep and full-length single-cell RNA sequencing on Fluidigm's Polaris platform to reveal the cellular, transcriptomic, and isoform heterogeneity of SUM149, a triple negative breast cancer (TNBC) cell line. We first validate the quality of the TNBC sequencing data with the sequencing data from erythroleukemia K562 cell line as control. We next scrutinized well-defined marker genes for cancer stem-like cell to identify different cell populations. We then profile the isoform expression data to investigate the heterogeneity of alternative splicing patterns. Though classified as triple-negative breast cancer, the SUM149 stem cells show heterogeneous expression of marker receptors (ER, PR, and HER2) across the cells. We identified three cell populations that express patterns of stemness: epithelial-mesenchymal transition (EMT) cancer stem cells (CSCs), mesenchymal-epithelial transition (MET) CSCs and Dual-EMT-MET CSCs. These cells also manifested a high level of heterogeneity in alternative splicing patterns. For example, CSCs have shown different expression patterns of the CD44v6 exon, as well as different levels of truncated EGFR transcripts, which may suggest different potentials for proliferation and invasion among cancer stem cells. Our study identified features of the landscape of previously underestimated cellular, transcriptomic, and isoform heterogeneity of cancer stem cells in triple-negative breast cancers.
RESUMO
Tumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking. To address this need, we systematically assess methods for reconstructing tumor subclonality. First, we elucidate the main algorithmic problems in subclonal reconstruction and develop quantitative metrics for evaluating them. Then we simulate realistic tumor genomes that harbor all known clonal and subclonal mutation types and processes. Finally, we benchmark 580 tumor reconstructions, varying tumor read depth, tumor type and somatic variant detection. Our analysis provides a baseline for the establishment of gold-standard methods to analyze tumor heterogeneity.
Assuntos
Algoritmos , Neoplasias/patologia , Células Clonais , Simulação por Computador , Variações do Número de Cópias de DNA/genética , Dosagem de Genes , Genoma , Humanos , Mutação/genética , Neoplasias/genética , Polimorfismo de Nucleotídeo Único/genética , Padrões de ReferênciaRESUMO
Phylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of sequence evolution, such methods could fail due to model misspecification or insufficiency, especially in the face of heterogeneities in substitution processes across sites and among lineages. In this study, we propose to infer topologies of four-taxon trees using deep residual neural networks, a machine learning approach needing no explicit modeling of the subject system and having a record of success in solving complex nonlinear inference problems. We train residual networks on simulated protein sequence data with extensive amino acid substitution heterogeneities. We show that the well-trained residual network predictors can outperform existing state-of-the-art inference methods such as the maximum likelihood method on diverse simulated test data, especially under extensive substitution heterogeneities. Reassuringly, residual network predictors generally agree with existing methods in the trees inferred from real phylogenetic data with known or widely believed topologies. Furthermore, when combined with the quartet puzzling algorithm, residual network predictors can be used to reconstruct trees with more than four taxa. We conclude that deep learning represents a powerful new approach to phylogenetic reconstruction, especially when sequences evolve via heterogeneous substitution processes. We present our best trained predictor in a freely available program named Phylogenetics by Deep Learning (PhyDL, https://gitlab.com/ztzou/phydl; last accessed January 3, 2020).
Assuntos
Aprendizado Profundo , Filogenia , Software , Animais , Proteínas Luminescentes/genética , Mamíferos/genética , Plantas/genética , Proteína Vermelha FluorescenteRESUMO
BACKGROUND: The classic central dogma in biology is the information flow from DNA to mRNA to protein, yet complicated regulatory mechanisms underlying protein translation often lead to weak correlations between mRNA and protein abundances. This is particularly the case in cancer samples and when evaluating the same gene across multiple samples. RESULTS: Here, we report a method for predicting proteome from transcriptome, using a training dataset provided by NCI-CPTAC and TCGA, consisting of transcriptome and proteome data from 77 breast and 105 ovarian cancer samples. First, we establish a generic model capturing the correlation between mRNA and protein abundance of a single gene. Second, we build a gene-specific model capturing the interdependencies among multiple genes in a regulatory network. Third, we create a cross-tissue model by joint learning the information of shared regulatory networks and pathways across cancer tissues. Our method ranked first in the NCI-CPTAC DREAM Proteogenomics Challenge, and the predictive performance is close to the accuracy of experimental replicates. Key functional pathways and network modules controlling the proteomic abundance in cancers were revealed, in particular metabolism-related genes. CONCLUSIONS: We present a method to predict proteome from transcriptome, leveraging data from different cancer tissues to build a trans-tissue model, and suggest how to integrate information from multiple cancers to provide a foundation for further research.
Assuntos
Neoplasias da Mama/genética , Aprendizado de Máquina , Neoplasias Ovarianas/genética , Proteoma/genética , Proteômica/métodos , Transcriptoma/genética , Feminino , HumanosRESUMO
OBJECTIVE: Accurate prediction of treatment responses in rheumatoid arthritis (RA) patients can provide valuable information on effective drug selection. Anti-tumor necrosis factor (anti-TNF) drugs are an important second-line treatment after methotrexate, the classic first-line treatment for RA. However, patient heterogeneity hinders identification of predictive biomarkers and accurate modeling of anti-TNF drug responses. This study was undertaken to investigate the usefulness of machine learning to assist in developing predictive models for treatment response. METHODS: Using data on patient demographics, baseline disease assessment, treatment, and single-nucleotide polymorphism (SNP) array from the Dialogue on Reverse Engineering Assessment and Methods (DREAM): Rheumatoid Arthritis Responder Challenge, we created a Gaussian process regression model to predict changes in the Disease Activity Score in 28 joints (DAS28) for the patients and to classify them into either the responder or the nonresponder group. This model was developed and cross-validated using data from 1,892 RA patients. It was evaluated using an independent data set from 680 patients. We examined the effectiveness of the similarity modeling and the contribution of individual features. RESULTS: In the cross-validation tests, our method predicted changes in DAS28 (ΔDAS28), with a correlation coefficient of 0.405. It correctly classified responses from 78% of patients. In the independent test, this method achieved a Pearson's correlation coefficient of 0.393 in predicting ΔDAS28. Gaussian process regression effectively remapped the feature space and identified subpopulations that do not respond well to anti-TNF treatments. Genetic SNP biomarkers showed small contributions in the prediction when added to the clinical models. This was the best-performing model in the DREAM Challenge. CONCLUSION: The model described here shows promise in guiding treatment decisions in clinical practice, based primarily on clinical profiles with additional genetic information.
Assuntos
Antirreumáticos/uso terapêutico , Artrite Reumatoide/tratamento farmacológico , Aprendizado de Máquina , Avaliação de Resultados em Cuidados de Saúde/métodos , Índice de Gravidade de Doença , Adalimumab/uso terapêutico , Artrite Reumatoide/genética , Certolizumab Pegol/uso terapêutico , Etanercepte/uso terapêutico , Feminino , Marcadores Genéticos/efeitos dos fármacos , Humanos , Infliximab/uso terapêutico , Masculino , Metotrexato/uso terapêutico , Pessoa de Meia-Idade , Distribuição Normal , Polimorfismo de Nucleotídeo Único , Valor Preditivo dos Testes , Análise de Regressão , Reprodutibilidade dos Testes , Resultado do Tratamento , Fator de Necrose Tumoral alfa/antagonistas & inibidoresRESUMO
The Chromosome-centric Human Proteome Project (C-HPP), announced in September 2016, is an initiative to accelerate progress on the detection and characterization of neXtProt PE2,3,4 "missing proteins" (MPs) with a mandate to each chromosome team to find about 50 MPs over 2 years. Here we report major progress toward the neXt-MP50 challenge with 43 newly validated Chr 17 PE1 proteins, of which 25 were based on mass spectrometry, 12 on protein-protein interactions, 3 on a combination of MS and PPI, and 3 with other types of data. Notable among these new PE1 proteins were five keratin-associated proteins, a single olfactory receptor, and five additional membrane-embedded proteins. We evaluate the prospects of finding the remaining 105 MPs coded for on Chr 17, focusing on mass spectrometry and protein-protein interaction approaches. We present a list of 35 prioritized MPs with specific approaches that may be used in further MS and PPI experimental studies. Additionally, we demonstrate how in silico studies can be used to capture individual peptides from major data repositories, documenting one MP that appears to be a strong candidate for PE1. We are close to our goal of finding 50 MPs for Chr 17.
Assuntos
Cromossomos Humanos Par 17/química , Proteoma/análise , Simulação por Computador , Humanos , Espectrometria de Massas , Métodos , Mapas de Interação de Proteínas , Proteínas/análiseRESUMO
Motivation: Heterogeneous diseases such as Alzheimer's disease (AD) manifest a variety of phenotypes among populations. Early diagnosis and effective treatment offer cost benefits. Many studies on biochemical and imaging markers have shown potential promise in improving diagnosis, yet establishing quantitative diagnostic criteria for ancillary tests remains challenging. Results: We have developed a similarity-based approach that matches individuals to subjects with similar conditions. We modeled the disease with a Gaussian process, and tested the method in the Alzheimer's Disease Big Data DREAM Challenge. Ranked the highest among submitted methods, our diagnostic model predicted cognitive impairment scores in an independent dataset test with a correlation score of 0.573. It differentiated AD patients from control subjects with an area under the receiver operating curve of 0.920. Without knowing longitudinal information about subjects, the model predicted patients who are vulnerable to conversion from mild-cognitive impairment to AD through the similarity network. This diagnostic framework can be applied to other diseases with clinical heterogeneity, such as Parkinson's disease.
Assuntos
Doença de Alzheimer/diagnóstico , Doença de Alzheimer/genética , Informática Médica/métodos , Algoritmos , Biomarcadores , Transtornos Cognitivos/diagnóstico , Disfunção Cognitiva/diagnóstico , Estudos de Coortes , Diagnóstico por Computador , Progressão da Doença , Humanos , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Distribuição Normal , Doença de Parkinson/diagnóstico , Fenótipo , Análise de Componente Principal , Prognóstico , Curva ROC , Sensibilidade e EspecificidadeRESUMO
Motivation: Finding driver genes that are responsible for the aberrant proliferation rate of cancer cells is informative for both cancer research and the development of targeted drugs. The established experimental and computational methods are labor-intensive. To make algorithms feasible in real clinical settings, methods that can predict driver genes using less experimental data are urgently needed. Results: We designed an effective feature selection method and used Support Vector Machines (SVM) to predict the essentiality of the potential driver genes in cancer cell lines with only 10 genes as features. The accuracy of our predictions was the highest in the Broad-DREAM Gene Essentiality Prediction Challenge. We also found a set of genes whose essentiality could be predicted much more accurately than others, which we called Accurately Predicted (AP) genes. Our method can serve as a new way of assessing the essentiality of genes in cancer cells. Availability and implementation: The raw data that support the findings of this study are available at Synapse. https://www.synapse.org/#! Synapse: syn2384331/wiki/62825. Source code is available at GitHub. https://github.com/GuanLab/DREAM-Gene-Essentiality-Challenge. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Biomarcadores Tumorais/genética , Variações do Número de Cópias de DNA , Genes Neoplásicos , Software , Biologia Computacional , Humanos , RNA Mensageiro/genéticaRESUMO
Autism spectrum disorder (ASD) is a neuropsychiatric disorder with strong evidence of genetic contribution, and increased research efforts have resulted in an ever-growing list of ASD candidate genes. However, only a fraction of the hundreds of nominated ASD-related genes have identified de novo or transmitted loss of function (LOF) mutations that can be directly attributed to the disorder. For this reason, a means of prioritizing candidate genes for ASD would help filter out false-positive results and allow researchers to focus on genes that are more likely to be causative. Here we constructed a machine learning model by leveraging a brain-specific functional relationship network (FRN) of genes to produce a genome-wide ranking of ASD risk genes. We rigorously validated our gene ranking using results from two independent sequencing experiments, together representing over 5000 simplex and multiplex ASD families. Finally, through functional enrichment analysis on our highly prioritized candidate gene network, we identified a small number of pathways that are key in early neural development, providing further support for their potential role in ASD.
Assuntos
Transtorno do Espectro Autista/genética , Bases de Dados Genéticas , Predisposição Genética para Doença/genética , Genômica/métodos , Aprendizado de Máquina , Modelos Genéticos , Animais , Genoma , Humanos , Camundongos , Fenótipo , RatosRESUMO
Survival analysis represents an important outcome measure in clinical research and clinical trials; further, survival ranking may offer additional advantages in clinical trials. In this study, we developed GuanRank, a non-parametric ranking-based technique to transform patients' survival data into a linear space of hazard ranks. The transformation enables the utilization of machine learning base-learners including Gaussian process regression, Lasso, and random forest on survival data. The method was submitted to the DREAM Amyotrophic Lateral Sclerosis (ALS) Stratification Challenge. Ranked first place, the model gave more accurate ranking predictions on the PRO-ACT ALS dataset in comparison to Cox proportional hazard model. By utilizing right-censored data in its training process, the method demonstrated its state-of-the-art predictive power in ALS survival ranking. Its feature selection identified multiple important factors, some of which conflicts with previous studies.
Assuntos
Esclerose Lateral Amiotrófica/mortalidade , Análise de Sobrevida , Algoritmos , Biologia Computacional , Bases de Dados Factuais , Feminino , Humanos , Estimativa de Kaplan-Meier , Aprendizado de Máquina , Masculino , Distribuição Normal , Modelos de Riscos Proporcionais , Análise de Regressão , Estatísticas não ParamétricasRESUMO
The rabbit (Oryctolagus cuniculus) is an important experimental animal for studying human diseases, such as hypercholesterolemia and atherosclerosis. Despite this, genetic information and RNA expression profiling of laboratory rabbits are lacking. Here, we characterized the whole-genome variants of three breeds of the most popular experimental rabbits, New Zealand White (NZW), Japanese White (JW) and Watanabe heritable hyperlipidemic (WHHL) rabbits. Although the genetic diversity of WHHL rabbits was relatively low, they accumulated a large proportion of high-frequency deleterious mutations due to the small population size. Some of the deleterious mutations were associated with the pathophysiology of WHHL rabbits in addition to the LDLR deficiency. Furthermore, we conducted transcriptome sequencing of different organs of both WHHL and cholesterol-rich diet (Chol)-fed NZW rabbits. We found that gene expression profiles of the two rabbit models were essentially similar in the aorta, even though they exhibited different types of hypercholesterolemia. In contrast, Chol-fed rabbits, but not WHHL rabbits, exhibited pronounced inflammatory responses and abnormal lipid metabolism in the liver. These results provide valuable insights into identifying therapeutic targets of hypercholesterolemia and atherosclerosis with rabbit models.
Assuntos
Aterosclerose/genética , Dieta Hiperlipídica/efeitos adversos , Variação Genética , Genoma , Hipercolesterolemia/genética , Receptores de LDL/genética , Animais , Aorta/metabolismo , Aorta/patologia , Aterosclerose/induzido quimicamente , Aterosclerose/metabolismo , Aterosclerose/patologia , Colesterol/administração & dosagem , Modelos Animais de Doenças , Expressão Gênica , Humanos , Hipercolesterolemia/induzido quimicamente , Hipercolesterolemia/metabolismo , Hipercolesterolemia/patologia , Fígado/metabolismo , Fígado/patologia , Anotação de Sequência Molecular , Coelhos , Receptores de LDL/deficiência , Transcriptoma , Sequenciamento Completo do GenomaRESUMO
We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc.
Assuntos
Proteínas de Bactérias/química , Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Software , Algoritmos , Sequência de Aminoácidos , Bactérias/química , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Cooperação Internacional , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Alinhamento de SequênciaRESUMO
We report the structure prediction results of a new composite pipeline for template-based modeling (TBM) in the 11th CASP experiment. Starting from multiple structure templates identified by LOMETS based meta-threading programs, the QUARK ab initio folding program is extended to generate initial full-length models under strong constraints from template alignments. The final atomic models are then constructed by I-TASSER based fragment reassembly simulations, followed by the fragment-guided molecular dynamic simulation and the MQAP-based model selection. It was found that the inclusion of QUARK-TBM simulations as an intermediate modeling step could help improve the quality of the I-TASSER models for both Easy and Hard TBM targets. Overall, the average TM-score of the first I-TASSER model is 12% higher than that of the best LOMETS templates, with the RMSD in the same threading-aligned regions reduced from 5.8 to 4.7 Å. Nevertheless, there are nearly 18% of TBM domains with the templates deteriorated by the structure assembly pipeline, which may be attributed to the errors of secondary structure and domain orientation predictions that propagate through and degrade the procedures of template identification and final model selections. To examine the record of progress, we made a retrospective report of the I-TASSER pipeline in the last five CASP experiments (CASP7-11). The data show no clear progress of the LOMETS threading programs over PSI-BLAST; but obvious progress on structural improvement relative to threading templates was witnessed in recent CASP experiments, which is probably attributed to the integration of the extended ab initio folding simulation with the threading assembly pipeline and the introduction of atomic-level structure refinements following the reduced modeling simulations. Proteins 2016; 84(Suppl 1):233-246. © 2015 Wiley Periodicals, Inc.
Assuntos
Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Internet , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia Estrutural de Proteína , TermodinâmicaRESUMO
MOTIVATION: G protein-coupled receptors (GPCRs) are probably the most attractive drug target membrane proteins, which constitute nearly half of drug targets in the contemporary drug discovery industry. While the majority of drug discovery studies employ existing GPCR and ligand interactions to identify new compounds, there remains a shortage of specific databases with precisely annotated GPCR-ligand associations. RESULTS: We have developed a new database, GLASS, which aims to provide a comprehensive, manually curated resource for experimentally validated GPCR-ligand associations. A new text-mining algorithm was proposed to collect GPCR-ligand interactions from the biomedical literature, which is then crosschecked with five primary pharmacological datasets, to enhance the coverage and accuracy of GPCR-ligand association data identifications. A special architecture has been designed to allow users for making homologous ligand search with flexible bioactivity parameters. The current database contains â¼500 000 unique entries, of which the vast majority stems from ligand associations with rhodopsin- and secretin-like receptors. The GLASS database should find its most useful application in various in silico GPCR screening and functional annotation studies. AVAILABILITY AND IMPLEMENTATION: The website of GLASS database is freely available at http://zhanglab.ccmb.med.umich.edu/GLASS/. CONTACT: zhng@umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Bases de Dados de Proteínas , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/metabolismo , Mineração de Dados , Humanos , Internet , Ligantes , Modelos Moleculares , Ligação ProteicaRESUMO
BACKGROUND: The performance of dry powder aerosol delivery systems depends not only on the powder formulation but also on the dry powder inhalers (DPIs). Effects of turbulence, grid, mouthpiece, inlet size, air flow, and capsule on the DPIs performance have been investigated previously. Considering powder dispersion in DPIs is a time-dependent process, the powder residence time in DPIs is supposed to have a great impact on DPIs efficiency. This study sought to investigate the effect of powder residence time on the performance of a commercial DPI Aerolizer(®). METHODS: A standard Aerolizer(®) (SD) and five modified devices (MD1, MD2, MD3, MD4, and MD5) were employed for this research. Computational fluid dynamics analysis was used to calculate the flow field and the powder residence time in these devices. Recombinant human interleukin-2 inhalation powders and a twin impinger were used for the deposition experiment. RESULTS: The powder mean residence time in the secondary atomization zone of the devices was increased from 0 ms for SD to 0.33, 0.96, 1.42, 1.76, and 2.14 ms for MD1, MD2, MD3, MD4, and MD5, respectively. At a flow rate of 60 L/min, with an increase in the powder residence time in these devices, a significant gradual and increasing trend in the powder respirable fraction was observed from 29.1%± 1.1% (MD1) to 32.6% ± 2.2% (MD2), 37.1% ± 1.1% (MD3), and 43.7% ± 2.1% (MD4). There was no significant difference in the powder respirable fraction between SD and MD1 or between MD4 and MD5. CONCLUSIONS: Within a certain range, increasing the powder residence time could improve the performance of Aerolizer(®) by increasing the powder-air interaction time (the main reason) and increasing the powder-device compaction (the secondary reason). Combination of high turbulence level and sufficient powder residence time could further improve the device performance.