Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
Cell Rep ; 43(7): 114448, 2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003740

RESUMEN

Noonan syndrome patients harboring causative variants in LZTR1 are particularly at risk to develop severe and early-onset hypertrophic cardiomyopathy. In this study, we investigate the mechanistic consequences of a homozygous variant LZTR1L580P by using patient-specific and CRISPR-Cas9-corrected induced pluripotent stem cell (iPSC) cardiomyocytes. Molecular, cellular, and functional phenotyping in combination with in silico prediction identify an LZTR1L580P-specific disease mechanism provoking cardiac hypertrophy. The variant is predicted to alter the binding affinity of the dimerization domains facilitating the formation of linear LZTR1 polymers. LZTR1 complex dysfunction results in the accumulation of RAS GTPases, thereby provoking global pathological changes of the proteomic landscape ultimately leading to cellular hypertrophy. Furthermore, our data show that cardiomyocyte-specific MRAS degradation is mediated by LZTR1 via non-proteasomal pathways, whereas RIT1 degradation is mediated by both LZTR1-dependent and LZTR1-independent pathways. Uni- or biallelic genetic correction of the LZTR1L580P missense variant rescues the molecular and cellular disease phenotype, providing proof of concept for CRISPR-based therapies.

2.
bioRxiv ; 2024 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-38854075

RESUMEN

Animal venoms, distinguished by their unique structural features and potent bioactivities, represent a vast and relatively untapped reservoir of therapeutic molecules. However, limitations associated with extracting or expressing large numbers of individual venoms and venom-like molecules have precluded their therapeutic evaluation via high throughput screening. Here, we developed an innovative computational approach to design a highly diverse library of animal venoms and "metavenoms". We employed programmable M13 hyperphage display to preserve critical disulfide-bonded structures for highly parallelized single-round biopanning with quantitation via high-throughput DNA sequencing. Our approach led to the discovery of Kunitz type domain containing proteins that target the human itch receptor Mas-related G protein-coupled receptor X4 (MRGPRX4), which plays a crucial role in itch perception. Deep learning-based structural homology mining identified two endogenous human homologs, tissue factor pathway inhibitor (TFPI) and serine peptidase inhibitor, Kunitz type 2 (SPINT2), which exhibit agonist-dependent potentiation of MRGPRX4. Highly multiplexed screening of animal venoms and metavenoms is therefore a promising approach to uncover new drug candidates.

3.
Nat Methods ; 21(6): 971-973, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38769467

RESUMEN

Metagenomic taxonomic classifiers analyze either DNA or amino acid (AA) sequences. Metabuli ( https://metabuli.steineggerlab.com ), however, jointly analyzes both DNA and AA to leverage AA conservation for sensitive homology detection and DNA mutations for specific differentiation of closely related taxa. In the Critical Assessment of Metagenome Interpretation 2 plant-associated dataset, Metabuli covered 99% and 98% of classifications of state-of-the-art DNA- and AA-based classifiers, respectively.


Asunto(s)
Aminoácidos , Metagenoma , Metagenómica , Metagenómica/métodos , Aminoácidos/genética , ADN/genética , Programas Informáticos , Plantas/clasificación , Análisis de Secuencia de ADN/métodos , Secuencia de Aminoácidos
5.
Artículo en Inglés | MEDLINE | ID: mdl-38316555

RESUMEN

The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.


Asunto(s)
Biología Computacional , Proteínas , Biología Computacional/métodos , Proteínas/química , Alineación de Secuencia , Conformación Proteica , Programas Informáticos , Algoritmos , Análisis de Secuencia de Proteína/métodos
6.
Nucleic Acids Res ; 52(D1): D426-D433, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37933852

RESUMEN

The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.


Asunto(s)
Aminoácidos , Proteoma , Proteoma/química , Bases de Datos Factuales
7.
Nucleic Acids Res ; 52(D1): D368-D375, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37933859

RESUMEN

The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.


The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries, marking a 500-times expansion in size since its initial release in 2021. The structures are predicted using Google DeepMind's AlphaFold 2 artificial intelligence (AI) system. Our new report highlights the latest updates we have made to this database. We have added more data on specific organisms and proteins related to global health and expanded to cover almost the complete UniProt database, a primary data resource of protein sequences. We also made it easier for our users to access the data by directly downloading files or using advanced cloud-based tools. Finally, we have also improved how users view and search through these protein structures, making the user experience smoother and more informative. In short, AlphaFold DB has been growing rapidly and has become more user-friendly and robust to support the broader scientific community.


Asunto(s)
Inteligencia Artificial , Estructura Secundaria de Proteína , Proteoma , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Motor de Búsqueda , Proteínas/química
8.
Nat Biotechnol ; 42(2): 243-246, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37156916

RESUMEN

As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the structure of a query protein against a database by describing tertiary amino acid interactions within proteins as sequences over a structural alphabet. Foldseek decreases computation times by four to five orders of magnitude with 86%, 88% and 133% of the sensitivities of Dali, TM-align and CE, respectively.


Asunto(s)
Algoritmos , Proteínas , Bases de Datos de Proteínas , Proteínas/química , Aminoácidos , Programas Informáticos
9.
bioRxiv ; 2023 Nov 26.
Artículo en Inglés | MEDLINE | ID: mdl-38045331

RESUMEN

The sequence-structure-function relationships that ultimately generate the diversity of extant observed proteins is complex, as proteins bridge the gap between multiple informational and physical scales involved in nearly all cellular processes. One limitation of existing protein annotation databases such as UniProt is that less than 1% of proteins have experimentally verified functions, and computational methods are needed to fill in the missing information. Here, we demonstrate that a multi-aspect framework based on protein language models can learn sequence-structure-function representations of amino acid sequences, and can provide the foundation for sensitive sequence-structure-function aware protein sequence search and annotation. Based on this model, we introduce a multi-aspect information retrieval system for proteins, Protein-Vec, covering sequence, structure, and function aspects, that enables computational protein annotation and function prediction at tree-of-life scales.

10.
Genome Biol ; 24(1): 249, 2023 10 30.
Artículo en Inglés | MEDLINE | ID: mdl-37904256

RESUMEN

CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .


Asunto(s)
Genoma Humano , Proteínas , Humanos , Filogenia , Proteínas/genética , Algoritmos , Programas Informáticos , Anotación de Secuencia Molecular
11.
Nature ; 622(7983): 637-645, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37704730

RESUMEN

Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy1, and over 214 million predicted structures are available in the AlphaFold database2. However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm-Foldseek cluster-that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Proteínas , Homología Estructural de Proteína , Humanos , Bases de Datos de Proteínas , Proteínas/química , Proteínas/clasificación , Proteínas/metabolismo , Alineación de Secuencia , Anotación de Secuencia Molecular , Células Procariotas/química , Filogenia , Especificidad de la Especie , Evolución Molecular
12.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37535681

RESUMEN

MOTIVATION: Efficiently aligning sequences is a fundamental problem in bioinformatics. Many recent algorithms for computing alignments through Smith-Waterman-Gotoh dynamic programming (DP) exploit Single Instruction Multiple Data (SIMD) operations on modern CPUs for speed. However, these advances have largely ignored difficulties associated with efficiently handling complex scoring matrices or large gaps (insertions or deletions). RESULTS: We propose a new SIMD-accelerated algorithm called Block Aligner for aligning nucleotide and protein sequences against other sequences or position-specific scoring matrices. We introduce a new paradigm that uses blocks in the DP matrix that greedily shift, grow, and shrink. This approach allows regions of the DP matrix to be adaptively computed. Our algorithm reaches over 5-10 times faster than some previous methods while incurring an error rate of less than 3% on protein and long read datasets, despite large gaps and low sequence identities. AVAILABILITY AND IMPLEMENTATION: Our algorithm is implemented for global, local, and X-drop alignments. It is available as a Rust library (with C bindings) at https://github.com/Daniel-Liu-c0deb0t/block-aligner.


Asunto(s)
Algoritmos , Proteínas , Posición Específica de Matrices de Puntuación , Alineación de Secuencia , Análisis de Secuencia , Programas Informáticos
13.
bioRxiv ; 2023 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-37503235

RESUMEN

The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.

14.
Proc Natl Acad Sci U S A ; 120(28): e2301007120, 2023 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-37399371

RESUMEN

Wood-decaying fungi are the major decomposers of plant litter. Heavy sequencing efforts on genomes of wood-decaying fungi have recently been made due to the interest in their lignocellulolytic enzymes; however, most parts of their proteomes remain uncharted. We hypothesized that wood-decaying fungi would possess promiscuous enzymes for detoxifying antifungal phytochemicals remaining in the dead plant bodies, which can be useful biocatalysts. We designed a computational mass spectrometry-based untargeted metabolomics pipeline for the phenotyping of biotransformation and applied it to 264 fungal cultures supplemented with antifungal plant phenolics. The analysis identified the occurrence of diverse reactivities by the tested fungal species. Among those, we focused on O-xylosylation of multiple phenolics by one of the species tested, Lentinus brumalis. By integrating the metabolic phenotyping results with publicly available genome sequences and transcriptome analysis, a UDP-glycosyltransferase designated UGT66A1 was identified and validated as an enzyme catalyzing O-xylosylation with broad substrate specificity. We anticipate that our analytical workflow will accelerate the further characterization of fungal enzymes as promising biocatalysts.


Asunto(s)
Glucosiltransferasas , Lentinula , Metabolómica , Metabolómica/métodos , Lentinula/enzimología , Glucosiltransferasas/química , Glucosiltransferasas/aislamiento & purificación , Glucosiltransferasas/metabolismo , Fitoquímicos/metabolismo , Xilosa/metabolismo , Genoma Fúngico , Cromatografía Líquida con Espectrometría de Masas
15.
Genome Biol ; 24(1): 113, 2023 05 12.
Artículo en Inglés | MEDLINE | ID: mdl-37173746

RESUMEN

BACKGROUND: Protein annotation is a major goal in molecular biology, yet experimentally determined knowledge is typically limited to a few model organisms. In non-model species, the sequence-based prediction of gene orthology can be used to infer protein identity; however, this approach loses predictive power at longer evolutionary distances. Here we propose a workflow for protein annotation using structural similarity, exploiting the fact that similar protein structures often reflect homology and are more conserved than protein sequences. RESULTS: We propose a workflow of openly available tools for the functional annotation of proteins via structural similarity (MorF: MorphologFinder) and use it to annotate the complete proteome of a sponge. Sponges are highly relevant for inferring the early history of animals, yet their proteomes remain sparsely annotated. MorF accurately predicts the functions of proteins with known homology in [Formula: see text] cases and annotates an additional [Formula: see text] of the proteome beyond standard sequence-based methods. We uncover new functions for sponge cell types, including extensive FGF, TGF, and Ephrin signaling in sponge epithelia, and redox metabolism and control in myopeptidocytes. Notably, we also annotate genes specific to the enigmatic sponge mesocytes, proposing they function to digest cell walls. CONCLUSIONS: Our work demonstrates that structural similarity is a powerful approach that complements and extends sequence similarity searches to identify homologous proteins over long evolutionary distances. We anticipate this will be a powerful approach that boosts discovery in numerous -omics datasets, especially for non-model organisms.


Asunto(s)
Proteoma , Animales , Anotación de Secuencia Molecular , Secuencia de Aminoácidos
16.
Bioinformatics ; 39(4)2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-36961332

RESUMEN

SUMMARY: Highly accurate protein structure predictors have generated hundreds of millions of protein structures; these pose a challenge in terms of storage and processing. Here, we present Foldcomp, a novel lossy structure compression algorithm, and indexing system to address this challenge. By using a combination of internal and Cartesian coordinates and a bi-directional NeRF-based strategy, Foldcomp improves the compression ratio by a factor of three compared to the next best method. Its reconstruction error of 0.08 Å is comparable to the best lossy compressor. It is five times faster than the next fastest compressor and competes with the fastest decompressors. With its multi-threading implementation and a Python interface that allows for easy database downloads and efficient querying of protein structures by accession, Foldcomp is a powerful tool for managing and analysing large collections of protein structures. AVAILABILITY AND IMPLEMENTATION: Foldcomp is a free open-source software (GPLv3) and available for Linux, macOS, and Windows at https://foldcomp.foldseek.com. Foldcomp provides the AlphaFold Swiss-Prot (2.9GB), TrEMBL (1.1TB), and ESMatlas HQ (114GB) database ready-for-download.


Asunto(s)
Compresión de Datos , Programas Informáticos , Algoritmos , Compresión de Datos/métodos , Proteínas , Biblioteca de Genes
17.
Commun Biol ; 6(1): 160, 2023 02 08.
Artículo en Inglés | MEDLINE | ID: mdl-36755055

RESUMEN

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.


Asunto(s)
Furilfuramida , Proteínas , Humanos , Bases de Datos de Proteínas , Proteínas/química
18.
Protein Sci ; 32(1): e4524, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36454227

RESUMEN

The availability of accurate and fast artificial intelligence (AI) solutions predicting aspects of proteins are revolutionizing experimental and computational molecular biology. The webserver LambdaPP aspires to supersede PredictProtein, the first internet server making AI protein predictions available in 1992. Given a protein sequence as input, LambdaPP provides easily accessible visualizations of protein 3D structure, along with predictions at the protein level (GeneOntology, subcellular location), and the residue level (binding to metal ions, small molecules, and nucleotides; conservation; intrinsic disorder; secondary structure; alpha-helical and beta-barrel transmembrane segments; signal-peptides; variant effect) in seconds. The structure prediction provided by LambdaPP-leveraging ColabFold and computed in minutes-is based on MMseqs2 multiple sequence alignments. All other feature prediction methods are based on the pLM ProtT5. Queried by a protein sequence, LambdaPP computes protein and residue predictions almost instantly for various phenotypes, including 3D structure and aspects of protein function. LambdaPP is freely available for everyone to use under embed.predictprotein.org, the interactive results for the case study can be found under https://embed.predictprotein.org/o/Q9NZC2. The frontend of LambdaPP can be found on GitHub (github.com/sacdallago/embed.predictprotein.org), and can be freely used and distributed under the academic free use license (AFL-2). For high-throughput applications, all methods can be executed locally via the bio-embeddings (bioembeddings.com) python package, or docker image at ghcr.io/bioembeddings/bio_embeddings, which also includes the backend of LambdaPP.


Asunto(s)
Inteligencia Artificial , Proteínas , Proteínas/química , Secuencia de Aminoácidos , Estructura Secundaria de Proteína , Alineación de Secuencia , Programas Informáticos
19.
Trends Biochem Sci ; 48(4): 345-359, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36504138

RESUMEN

Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.


Asunto(s)
Aprendizaje Automático , Proteínas , Proteínas/química , Biología Computacional/métodos , Conformación Proteica
20.
Nucleic Acids Res ; 51(D1): D777-D784, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36271795

RESUMEN

In phylogenomics the evolutionary relationship of organisms is studied by their genomic information. A common approach to phylogenomics is to extract related genes from each organism, build a multiple sequence alignment and then reconstruct evolution relations through a phylogenetic tree. Often a set of highly conserved genes occurring in single-copy, called core genes, are used for this analysis, as they allow efficient automation within a taxonomic clade. Here we introduce the Universal Fungal Core Genes (UFCG) database and pipeline for genome-wide phylogenetic analysis of fungi. The UFCG database consists of 61 curated fungal marker genes, including a novel set of 41 computationally derived core genes and 20 canonical genes derived from literature, as well as marker gene sequences extracted from publicly available fungal genomes. Furthermore, we provide an easy-to-use, fully automated and open-source pipeline for marker gene extraction, training and phylogenetic tree reconstruction. The UFCG pipeline can identify marker genes from genomic, proteomic and transcriptomic data, while producing phylogenies consistent with those previously reported, and is publicly available together with the UFCG database at https://ufcg.steineggerlab.com.


Asunto(s)
Bases de Datos Genéticas , Hongos , Hongos/clasificación , Hongos/genética , Genes Fúngicos , Genoma Fúngico , Filogenia , Proteómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA