Pesquisa | BVS Bolivia

Illuminating protein space with a programmable generative model.

Ingraham, John B; Baranov, Max; Costello, Zak; Barber, Karl W; Wang, Wujie; Ismail, Ahmed; Frappier, Vincent; Lord, Dana M; Ng-Thow-Hing, Christopher; Van Vlack, Erik R; Tie, Shan; Xue, Vincent; Cowles, Sarah C; Leung, Alan; Rodrigues, João V; Morales-Perez, Claudio L; Ayoub, Alex M; Green, Robin; Puentes, Katherine; Oplinger, Frank; Panwar, Nishant V; Obermeyer, Fritz; Root, Adam R; Beam, Andrew L; Poelwijk, Frank J; Grigoryan, Gevorg.

Nature ; 623(7989): 1070-1078, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37968394

RESUMO

Three billion years of evolution has produced a tremendous diversity of protein molecules1, but the full potential of proteins is likely to be much greater. Accessing this potential has been challenging for both computation and experiments because the space of possible protein molecules is much larger than the space of those likely to have functions. Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences, and that can be conditioned to steer the generative process towards desired properties and functions. To enable this, we introduce a diffusion process that respects the conformational statistics of polymer ensembles, an efficient neural architecture for molecular systems that enables long-range reasoning with sub-quadratic scaling, layers for efficiently synthesizing three-dimensional structures of proteins from predicted inter-residue geometries and a general low-temperature sampling algorithm for diffusion models. Chroma achieves protein design as Bayesian inference under external constraints, which can involve symmetries, substructure, shape, semantics and even natural-language prompts. The experimental characterization of 310 proteins shows that sampling from Chroma results in proteins that are highly expressed, fold and have favourable biophysical properties. The crystal structures of two designed proteins exhibit atomistic agreement with Chroma samples (a backbone root-mean-square deviation of around 1.0 Å). With this unified approach to protein design, we hope to accelerate the programming of protein matter to benefit human health, materials science and synthetic biology.

Assuntos

Algoritmos , Simulação por Computador , Conformação Proteica , Proteínas , Humanos , Teorema de Bayes , Evolução Molecular Direcionada , Aprendizado de Máquina , Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Semântica , Biologia Sintética/métodos , Biologia Sintética/tendências

Peptide design by optimization on a data-parameterized protein interaction landscape.

Jenson, Justin M; Xue, Vincent; Stretz, Lindsey; Mandal, Tirtha; Reich, Lothar Luther; Keating, Amy E.

Proc Natl Acad Sci U S A ; 115(44): E10342-E10351, 2018 10 30.

Artigo em Inglês | MEDLINE | ID: mdl-30322927

RESUMO

Many applications in protein engineering require optimizing multiple protein properties simultaneously, such as binding one target but not others or binding a target while maintaining stability. Such multistate design problems require navigating a high-dimensional space to find proteins with desired characteristics. A model that relates protein sequence to functional attributes can guide design to solutions that would be hard to discover via screening. In this work, we measured thousands of protein-peptide binding affinities with the high-throughput interaction assay amped SORTCERY and used the data to parameterize a model of the alpha-helical peptide-binding landscape for three members of the Bcl-2 family of proteins: Bcl-xL, Mcl-1, and Bfl-1. We applied optimization protocols to explore extremes in this landscape to discover peptides with desired interaction profiles. Computational design generated 36 peptides, all of which bound with high affinity and specificity to just one of Bcl-xL, Mcl-1, or Bfl-1, as intended. We designed additional peptides that bound selectively to two out of three of these proteins. The designed peptides were dissimilar to known Bcl-2-binding peptides, and high-resolution crystal structures confirmed that they engaged their targets as expected. Excellent results on this challenging problem demonstrate the power of a landscape modeling approach, and the designed peptides have potential uses as diagnostic tools or cancer therapeutics.

Assuntos

Peptídeos/química , Peptídeos/metabolismo , Animais , Proteínas Reguladoras de Apoptose/metabolismo , Linhagem Celular , Escherichia coli/metabolismo , Humanos , Camundongos , Proteína de Sequência 1 de Leucemia de Células Mieloides/metabolismo , Ligação Proteica/fisiologia , Engenharia de Proteínas/métodos , Proteínas Proto-Oncogênicas c-bcl-2/metabolismo , Leveduras/metabolismo , Proteína bcl-X/metabolismo

Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.

Torrente, Aurora; Lukk, Margus; Xue, Vincent; Parkinson, Helen; Rung, Johan; Brazma, Alvis.

PLoS One ; 11(6): e0157484, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27322383

RESUMO

Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from â¼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of â¼20,000 genes and â¼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain.

Assuntos

Biomarcadores Tumorais/biossíntese , Regulação Neoplásica da Expressão Gênica , Proteínas de Neoplasias/biossíntese , Neoplasias/genética , Biomarcadores Tumorais/genética , Ciclo Celular/genética , Diferenciação Celular/genética , Divisão Celular/genética , Biologia Computacional , Replicação do DNA/genética , Bases de Dados Genéticas , Humanos , Proteínas de Neoplasias/genética , Neoplasias/patologia , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Componente Principal , Análise Serial de Proteínas

Multiscale modeling of the causal functional roles of nsSNPs in a genome-wide association study: application to hypoxia.

Xie, Li; Ng, Clara; Ali, Thahmina; Valencia, Raoul; Ferreira, Barbara L; Xue, Vincent; Tanweer, Maliha; Zhou, Dan; Haddad, Gabriel G; Bourne, Philip E; Xie, Lei.

BMC Genomics ; 14 Suppl 3: S9, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23819581

RESUMO

BACKGROUND: It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions. RESULTS: To address the challenge of understanding the genetic, molecular, and cellular basis of complex phenotypes, we have, for the first time, developed a structural systems biology approach for genome-wide multiscale modeling of nsSNPs--from the atomic details of molecular interactions to the emergent properties of biological networks. We apply our approach to determine the functional roles of nsSNPs associated with hypoxia tolerance in Drosophila melanogaster. The integrated view of the functional roles of nsSNP at both molecular and network levels allows us to identify driver mutations and their interactions (epistasis) in H, Rad51D, Ulp1, Wnt5, HDAC4, Sol, Dys, GalNAc-T2, and CG33714 genes, all of which are involved in the up-regulation of Notch and Gurken/EGFR signaling pathways. Moreover, we find that a large fraction of the driver mutations are neither located in conserved functional sites, nor responsible for structural stability, but rather regulate protein activity through allosteric transitions, protein-protein interactions, or protein-nucleic acid interactions. This finding should impact future Genome-Wide Association Studies. CONCLUSIONS: Our studies demonstrate that the consolidation of statistical, structural, and network views of biomolecules and their interactions can provide new insight into the functional role of nsSNPs in Genome-Wide Association Studies, in a way that neither the knowledge of molecular structures nor biological networks alone could achieve. Thus, multiscale modeling of nsSNPs may prove to be a powerful tool for establishing the functional roles of sequence variants in a wide array of applications.

Assuntos

Adaptação Biológica/genética , Substituição de Aminoácidos/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Moleculares , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Regulação Alostérica , Anaerobiose , Animais , Biologia Computacional , Drosophila melanogaster , Modelos Genéticos , Mapas de Interação de Proteínas/genética , Transdução de Sinais/genética , Biologia de Sistemas/métodos

MageComet--web application for harmonizing existing large-scale experiment descriptions.

Xue, Vincent; Burdett, Tony; Lukk, Margus; Taylor, Julie; Brazma, Alvis; Parkinson, Helen.

Bioinformatics ; 28(10): 1402-3, 2012 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-22474121

RESUMO

MOTIVATION: Meta-analysis of large gene expression datasets obtained from public repositories requires consistently annotated data. Curation of such experiments, however, is an expert activity which involves repetitive manipulation of text. Existing tools for automated curation are few, which bottleneck the analysis pipeline. RESULTS: We present MageComet, a web application for biologists and annotators that facilitates the re-annotation of gene expression experiments in MAGE-TAB format. It incorporates data mining, automatic annotation, use of ontologies and data validation to improve the consistency and quality of experimental meta-data from the ArrayExpress Repository.

Assuntos

Bases de Dados Genéticas , Internet , Anotação de Sequência Molecular , Mineração de Dados , Metanálise como Assunto , Transcriptoma

Pervasive recombination and sympatric genome diversification driven by frequency-dependent selection in Borrelia burgdorferi, the Lyme disease bacterium.

Haven, James; Vargas, Levy C; Mongodin, Emmanuel F; Xue, Vincent; Hernandez, Yozen; Pagan, Pedro; Fraser-Liggett, Claire M; Schutzer, Steven E; Luft, Benjamin J; Casjens, Sherwood R; Qiu, Wei-Gang.

Genetics ; 189(3): 951-66, 2011 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-21890743

RESUMO

How genomic diversity within bacterial populations originates and is maintained in the presence of frequent recombination is a central problem in understanding bacterial evolution. Natural populations of Borrelia burgdorferi, the bacterial agent of Lyme disease, consist of diverse genomic groups co-infecting single individual vertebrate hosts and tick vectors. To understand mechanisms of sympatric genome differentiation in B. burgdorferi, we sequenced and compared 23 genomes representing major genomic groups in North America and Europe. Linkage analysis of >13,500 single-nucleotide polymorphisms revealed pervasive horizontal DNA exchanges. Although three times more frequent than point mutation, recombination is localized and weakly affects genome-wide linkage disequilibrium. We show by computer simulations that, while enhancing population fitness, recombination constrains neutral and adaptive divergence among sympatric genomes through periodic selective sweeps. In contrast, simulations of frequency-dependent selection with recombination produced the observed pattern of a large number of sympatric genomic groups associated with major sequence variations at the selected locus. We conclude that negative frequency-dependent selection targeting a small number of surface-antigen loci (ospC in particular) sufficiently explains the maintenance of sympatric genome diversity in B. burgdorferi without adaptive divergence. We suggest that pervasive recombination makes it less likely for local B. burgdorferi genomic groups to achieve host specialization. B. burgdorferi genomic groups in the northeastern United States are thus best viewed as constituting a single bacterial species, whose generalist nature is a key to its rapid spread and human virulence.

Assuntos

Borrelia burgdorferi/genética , Variação Genética/genética , Genoma Bacteriano/genética , Doença de Lyme/microbiologia , Recombinação Genética/genética , Seleção Genética , Simpatria/genética , Adaptação Fisiológica/genética , Animais , Borrelia burgdorferi/fisiologia , Sequência Conservada , Evolução Molecular , Conversão Gênica/genética , Especiação Genética , Humanos , Modelos Genéticos , Filogenia , Reprodutibilidade dos Testes , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA