Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38854010

RESUMO

Genome sequencing efforts have led to the discovery of tens of millions of protein missense variants found in the human population with the majority of these having no annotated role and some likely contributing to trait variation and disease. Sequence-based artificial intelligence approaches have become highly accurate at predicting variants that are detrimental to the function of proteins but they do not inform on mechanisms of disruption. Here we combined sequence and structure-based methods to perform proteome-wide prediction of deleterious variants with information on their impact on protein stability, protein-protein interactions and small-molecule binding pockets. AlphaFold2 structures were used to predict approximately 100,000 small-molecule binding pockets and stability changes for over 200 million variants. To inform on protein-protein interfaces we used AlphaFold2 to predict structures for nearly 500,000 protein complexes. We illustrate the value of mechanism-aware variant effect predictions to study the relation between protein stability and abundance and the structural properties of interfaces underlying trans protein quantitative trait loci (pQTLs). We characterised the distribution of mechanistic impacts of protein variants found in patients and experimentally studied example disease linked variants in FGFR1.

2.
N Biotechnol ; 83: 1-15, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38871051

RESUMO

Microbes able to convert gaseous one-carbon (C1) waste feedstocks are increasingly important to transition to the sustainable production of renewable chemicals and fuels. Acetogens are interesting biocatalysts since gas fermentation using Clostridium autoethanogenum has been commercialised. However, most acetogen strains need complex nutrients, display slow growth, and are not robust for bioreactor fermentations. In this work, we used three different and independent adaptive laboratory evolution (ALE) strategies to evolve the wild-type C. autoethanogenum to grow faster, without yeast extract and to be robust in operating continuous bioreactor cultures. Multiple evolved strains with improved phenotypes were isolated on minimal media with one strain, named "LAbrini", exhibiting superior performance regarding the maximum specific growth rate, product profile, and robustness in continuous cultures. Whole-genome sequencing of the evolved strains identified 25 mutations. Of particular interest are two genes that acquired seven different mutations across the three ALE strategies, potentially as a result of convergent evolution. Reverse genetic engineering of mutations in potentially sporulation-related genes CLAU_3129 (spo0A) and CLAU_1957 recovered all three superior features of our ALE strains through triggering significant proteomic rearrangements. This work provides a robust C. autoethanogenum strain "LAbrini" to accelerate phenotyping and genetic engineering and to better understand acetogen metabolism.

3.
Nucleic Acids Res ; 52(W1): W140-W147, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38769064

RESUMO

Genomic variation can impact normal biological function in complex ways and so understanding variant effects requires a broad range of data to be coherently assimilated. Whilst the volume of human variant data and relevant annotations has increased, the corresponding increase in the breadth of participating fields, standards and versioning mean that moving between genomic, coding, protein and structure positions is increasingly complex. In turn this makes investigating variants in diverse formats and assimilating annotations from different resources challenging. ProtVar addresses these issues to facilitate the contextualization and interpretation of human missense variation with unparalleled flexibility and ease of accessibility for use by the broadest range of researchers. By precalculating all possible variants in the human proteome it offers near instantaneous mapping between all relevant data types. It also combines data and analyses from a plethora of resources to bring together genomic, protein sequence and function annotations as well as structural insights and predictions to better understand the likely effect of missense variation in humans. It is offered as an intuitive web server https://www.ebi.ac.uk/protvar where data can be explored and downloaded, and can be accessed programmatically via an API.


Assuntos
Mutação de Sentido Incorreto , Software , Humanos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteoma/genética , Proteínas/genética , Proteínas/química , Internet , Genômica/métodos
4.
Artigo em Inglês | MEDLINE | ID: mdl-38621234

RESUMO

The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.

5.
Mol Syst Biol ; 20(3): 162-169, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38291232

RESUMO

Proteins are the key molecular machines that orchestrate all biological processes of the cell. Most proteins fold into three-dimensional shapes that are critical for their function. Studying the 3D shape of proteins can inform us of the mechanisms that underlie biological processes in living cells and can have practical applications in the study of disease mutations or the discovery of novel drug treatments. Here, we review the progress made in sequence-based prediction of protein structures with a focus on applications that go beyond the prediction of single monomer structures. This includes the application of deep learning methods for the prediction of structures of protein complexes, different conformations, the evolution of protein structures and the application of these methods to protein design. These developments create new opportunities for research that will have impact across many areas of biomedical research.


Assuntos
Aprendizado Profundo , Proteínas/metabolismo , Conformação Proteica
6.
Nature ; 622(7983): 637-645, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37704730

RESUMO

Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy1, and over 214 million predicted structures are available in the AlphaFold database2. However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm-Foldseek cluster-that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life.


Assuntos
Algoritmos , Análise por Conglomerados , Proteínas , Homologia Estrutural de Proteína , Humanos , Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Proteínas/metabolismo , Alinhamento de Sequência , Anotação de Sequência Molecular , Células Procarióticas/química , Filogenia , Especificidade da Espécie , Evolução Molecular
7.
Nat Struct Mol Biol ; 29(11): 1056-1067, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36344848

RESUMO

Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.


Assuntos
Biologia Computacional , Furilfuramida , Biologia Computacional/métodos , Sítios de Ligação , Proteínas/química , Bases de Dados de Proteínas , Conformação Proteica
8.
Genome Res ; 30(12): 1752-1765, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33093068

RESUMO

RNA profiling has provided increasingly detailed knowledge of gene expression patterns, yet the different regulatory architectures that drive them are not well understood. To address this, we profiled and compared transcriptional and regulatory element activities across five tissues of Caenorhabditis elegans, covering ∼90% of cells. We find that the majority of promoters and enhancers have tissue-specific accessibility, and we discover regulatory grammars associated with ubiquitous, germline, and somatic tissue-specific gene expression patterns. In addition, we find that germline-active and soma-specific promoters have distinct features. Germline-active promoters have well-positioned +1 and -1 nucleosomes associated with a periodic 10-bp WW signal (W = A/T). Somatic tissue-specific promoters lack positioned nucleosomes and this signal, have wide nucleosome-depleted regions, and are more enriched for core promoter elements, which largely differ between tissues. We observe the 10-bp periodic WW signal at ubiquitous promoters in other animals, suggesting it is an ancient conserved signal. Our results show fundamental differences in regulatory architectures of germline and somatic tissue-specific genes, uncover regulatory rules for generating diverse gene expression patterns, and provide a tissue-specific resource for future studies.


Assuntos
Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Perfilação da Expressão Gênica/veterinária , Células Germinativas/química , Animais , Regulação da Expressão Gênica , Humanos , Camundongos , Especificidade de Órgãos , Regiões Promotoras Genéticas , Análise de Sequência de RNA , Distribuição Tecidual , Sítio de Iniciação de Transcrição
9.
Elife ; 72018 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-30362940

RESUMO

An essential step for understanding the transcriptional circuits that control development and physiology is the global identification and characterization of regulatory elements. Here, we present the first map of regulatory elements across the development and ageing of an animal, identifying 42,245 elements accessible in at least one Caenorhabditis elegans stage. Based on nuclear transcription profiles, we define 15,714 protein-coding promoters and 19,231 putative enhancers, and find that both types of element can drive orientation-independent transcription. Additionally, more than 1000 promoters produce transcripts antisense to protein coding genes, suggesting involvement in a widespread regulatory mechanism. We find that the accessibility of most elements changes during development and/or ageing and that patterns of accessibility change are linked to specific developmental or physiological processes. The map and characterization of regulatory elements across C. elegans life provides a platform for understanding how transcription controls development and ageing.


Assuntos
Envelhecimento/metabolismo , Caenorhabditis elegans/crescimento & desenvolvimento , Caenorhabditis elegans/metabolismo , Cromatina/metabolismo , Animais , Caenorhabditis elegans/genética , DNA/genética , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Código das Histonas , Histonas/metabolismo , Anotação de Sequência Molecular , Regiões Promotoras Genéticas , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição
10.
Brief Bioinform ; 16(6): 932-40, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25788326

RESUMO

Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. The most challenging approach is to assemble reads into putative transcripts de novo without the aid of reference data. We have systematically assessed the properties of these three approaches through a simulation study. We have found that the sensitivity of computational transcript set estimation is severely limited. Computational approaches (both genome-guided and de novo assembly) produce a large number of artefacts, which are assigned large expression estimates and absorb a substantial proportion of the signal when performing expression analysis. The approach using curated annotations shows good expression correlation even when the annotations are incomplete. Furthermore, any incorrect transcripts present in a curated set do not absorb much signal, so it is preferable to have a curation set with high sensitivity than high precision. Software to simulate transcript sets, expression values and sequence reads under a wider range of parameter values and to compare sensitivity, precision and signal-to-noise ratios of different methods is freely available online (https://github.com/boboppie/RSSS) and can be expanded by interested parties to include methods other than the exemplars presented in this article.


Assuntos
Análise de Sequência de RNA/métodos , Bases de Dados Genéticas , RNA Mensageiro/genética
11.
Bioinformatics ; 24(4): 588-90, 2008 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-18056068

RESUMO

MOTIVATION: Gene expression analysis with microarrays has become one of the most widely used high-throughput methods for gathering genome-wide functional data. Emerging -omics fields such as proteomics and interactomics introduce new information sources. With the rise of systems biology, researchers need to concentrate on entire complex pathways that guide individual genes and related processes. Bioinformatics methods are needed to link the existing knowledge about pathways with the growing amounts of experimental data. RESULTS: We present KEGGanim, a novel web-based tool for visualizing experimental data in biological pathways. KEGGanim produces animations and images of KEGG pathways using public or user uploaded high-throughput data. Pathway members are coloured according to experimental measurements, and animated over experimental conditions. KEGGanim visualization highlights dynamic changes over conditions and allows the user to observe important modules and key genes that influence the pathway. The simple user interface of KEGGanim provides options for filtering genes and experimental conditions. KEGGanim may be used with public or private data for 14 organisms with a large collection of public microarray data readily available. Most common gene and protein identifiers and microarray probesets are accepted for visualization input. AVAILABILITY: http://biit.cs.ut.ee/KEGGanim/.


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Redes e Vias Metabólicas , Software , Animais , Regulação da Expressão Gênica , Humanos , Remodelação Ventricular/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA