Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 212
Filtrar
1.
bioRxiv ; 2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39131394

RESUMO

The daily light-dark cycle is a recurrent and predictable environmental phenomenon to which many organisms, including cyanobacteria, have evolved to adapt. Understanding how cyanobacteria alter their metabolic attributes in response to subjective light or dark growth may provide key features for developing strains with improved photosynthetic efficiency and applications in enhanced carbon sequestration and renewable energy. Here, we undertook a label-free proteomic approach to investigate the effect of extended light (LL) or extended dark (DD) conditions on the unicellular cyanobacterium Crocosphaera subtropica ATCC 51142. We quantified 2287 proteins, of which 603 proteins were significantly different between the two growth conditions. These proteins represent several biological processes, including photosynthetic electron transport, carbon fixation, stress responses, translation, and protein degradation. One significant observation is the regulation of over two dozen proteases, including ATP dependent Clp-proteases (endopeptidases) and metalloproteases, the majority of which were upregulated in LL compared to DD. This suggests that proteases play a crucial role in the regulation and maintenance of photosynthesis, especially the PSI and PSII components. The higher protease activity in LL indicates a need for more frequent degradation and repair of certain photosynthetic components, highlighting the dynamic nature of protein turnover and quality control mechanisms in response to prolonged light exposure. The results enhance our understanding of how Crocosphaera subtropica ATCC51142 adjusts its molecular machinery in response to extended light or dark growth conditions.

2.
bioRxiv ; 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39131303

RESUMO

Cyanobacteria have developed an impressive array of proteins and pathways, each tailored for specific metabolic attributes, to execute photosynthesis and biological nitrogen (N2)-fixation. An understanding of these biologically incompatible processes provides important insights into how they can be optimized for renewable energy. To expand upon our current knowledge, we performed label-free quantitative proteomic analysis of the unicellular diazotrophic cyanobacterium Crocosphaera subtropica ATCC 51142 grown with and without nitrate under 12-hour light-dark cycles. Results showed significant shift in metabolic activities including photosynthesis, respiration, biological nitrogen fixation (BNF), and proteostasis to different growth conditions. We identified 14 nitrogenase enzymes which were among the most highly expressed proteins in the dark under nitrogen-fixing conditions, emphasizing their importance in BNF. Nitrogenase enzymes were not expressed under non nitrogen fixing conditions, suggesting a regulatory mechanism based on nitrogen availability. The synthesis of key respiratory enzymes and uptake hydrogenase (HupSL) synchronized with the synthesis of nitrogenase indicating a coordinated regulation of processes involved in energy production and BNF. Data suggests alternative pathways that cells utilize, such as oxidative pentose phosphate (OPP) and 2-oxoglutarate (2-OG) pathways, to produce ATP and support bioenergetic BNF. Data also indicates the important role of uptake hydrogenase for the removal of O2 to support BNF. Overall, this study expands upon our knowledge regarding molecular responses of Crocosphaera 51142 to nitrogen and light-dark phases, shedding light on potential applications and optimization for renewable energy.

3.
Methods Mol Biol ; 2780: 149-162, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38987469

RESUMO

Protein-protein interactions are involved in almost all processes in a living cell and determine the biological functions of proteins. To obtain mechanistic understandings of protein-protein interactions, the tertiary structures of protein complexes have been determined by biophysical experimental methods, such as X-ray crystallography and cryogenic electron microscopy. However, as experimental methods are costly in resources, many computational methods have been developed that model protein complex structures. One of the difficulties in computational protein complex modeling (protein docking) is to select the most accurate models among many models that are usually generated by a docking method. This article reviews advances in protein docking model assessment methods, focusing on recent developments that apply deep learning to several network architectures.


Assuntos
Aprendizado Profundo , Simulação de Acoplamento Molecular , Proteínas , Simulação de Acoplamento Molecular/métodos , Proteínas/química , Proteínas/metabolismo , Ligação Proteica , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Software , Conformação Proteica , Cristalografia por Raios X/métodos
4.
Protein Sci ; 33(8): e5104, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38995055

RESUMO

Despite ferritin's critical role in regulating cellular and systemic iron levels, our understanding of the structure and assembly mechanism of isoferritins, discovered over eight decades ago, remains limited. Unveiling how the composition and molecular architecture of hetero-oligomeric ferritins confer distinct functionality to isoferritins is essential to understanding how the structural intricacies of H and L subunits influence their interactions with cellular machinery. In this study, ferritin heteropolymers with specific H to L subunit ratios were synthesized using a uniquely engineered plasmid design, followed by high-resolution cryo-electron microscopy analysis and deep learning-based amino acid modeling. Our structural examination revealed unique architectural features during the self-assembly mechanism of heteropolymer ferritins and demonstrated a significant preference for H-L heterodimer formation over H-H or L-L homodimers. Unexpectedly, while dimers seem essential building blocks in the protein self-assembly process, the overall mechanism of ferritin self-assembly is observed to proceed randomly through diverse pathways. The physiological significance of these findings is discussed including how ferritin microheterogeneity could represent a tissue-specific adaptation process that imparts distinctive tissue-specific functions to isoferritins.


Assuntos
Ferritinas , Multimerização Proteica , Humanos , Ferritinas/química , Ferritinas/metabolismo , Ferritinas/genética , Modelos Moleculares , Microscopia Crioeletrônica
5.
Nat Methods ; 21(7): 1340-1348, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38918604

RESUMO

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein-nucleic acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: Escherichia coli beta-galactosidase with inhibitor, SARS-CoV-2 virus RNA-dependent RNA polymerase with covalently bound nucleotide analog and SARS-CoV-2 virus ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. The quality of submitted ligand models and surrounding atoms were analyzed by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics and contact scores. A composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.


Assuntos
Microscopia Crioeletrônica , Modelos Moleculares , Microscopia Crioeletrônica/métodos , Ligantes , SARS-CoV-2 , COVID-19/virologia , Escherichia coli , beta-Galactosidase/química , beta-Galactosidase/metabolismo , Conformação Proteica , Reprodutibilidade dos Testes
6.
bioRxiv ; 2024 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-38766093

RESUMO

Analysis of factors that lead to the functionality of transcriptional activation domains remains a crucial and yet challenging task owing to the significant diversity in their sequences and their intrinsically disordered nature. Almost all existing methods that have aimed to predict activation domains have involved traditional machine learning approaches, such as logistic regression, that are unable to capture complex patterns in data or plain convolutional neural networks and have been limited in exploration of structural features. However, there is a tremendous potential in the inspection of the structural properties of activation domains, and an opportunity to investigate complex relationships between features of residues in the sequence. To address these, we have utilized the power of graph neural networks which can represent structural data in the form of nodes and edges, allowing nodes to exchange information among themselves. We have experimented with two kinds of graph formulations, one involving residues as nodes and the other assigning atoms to be the nodes. A logistic regression model was also developed to analyze feature importance. For all the models, several feature combinations were experimented with. The residue-level GNN model with amino acid type, residue position, acidic/basic/aromatic property and secondary structure feature combination gave the best performing model with accuracy, F1 score and AUROC of 97.9%, 71% and 97.1% respectively which outperformed other existing methods in the literature when applied on the dataset we used. Among the other structure-based features that were analyzed, the amphipathic property of helices also proved to be an important feature for classification. Logistic regression results showed that the most dominant feature that makes a sequence functional is the frequency of different types of amino acids in the sequence. Our results consistent have shown that functional sequences have more acidic and aromatic residues whereas basic residues are seen more in non-functional sequences.

7.
Elife ; 132024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38619110

RESUMO

A productive HIV-1 infection in humans is often established by transmission and propagation of a single transmitted/founder (T/F) virus, which then evolves into a complex mixture of variants during the lifetime of infection. An effective HIV-1 vaccine should elicit broad immune responses in order to block the entry of diverse T/F viruses. Currently, no such vaccine exists. An in-depth study of escape variants emerging under host immune pressure during very early stages of infection might provide insights into such a HIV-1 vaccine design. Here, in a rare longitudinal study involving HIV-1 infected individuals just days after infection in the absence of antiretroviral therapy, we discovered a remarkable genetic shift that resulted in near complete disappearance of the original T/F virus and appearance of a variant with H173Y mutation in the variable V2 domain of the HIV-1 envelope protein. This coincided with the disappearance of the first wave of strictly H173-specific antibodies and emergence of a second wave of Y173-specific antibodies with increased breadth. Structural analyses indicated conformational dynamism of the envelope protein which likely allowed selection of escape variants with a conformational switch in the V2 domain from an α-helix (H173) to a ß-strand (Y173) and induction of broadly reactive antibody responses. This differential breadth due to a single mutational change was also recapitulated in a mouse model. Rationally designed combinatorial libraries containing 54 conformational variants of V2 domain around position 173 further demonstrated increased breadth of antibody responses elicited to diverse HIV-1 envelope proteins. These results offer new insights into designing broadly effective HIV-1 vaccines.


Assuntos
Vacinas contra a AIDS , Dermatite , HIV-1 , Animais , Camundongos , Humanos , HIV-1/genética , Formação de Anticorpos , Estudos Longitudinais , Vacinas contra a AIDS/genética , Anticorpos , Antígenos Virais
8.
NPJ Syst Biol Appl ; 10(1): 29, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38491038

RESUMO

Understanding the biological functions of proteins is of fundamental importance in modern biology. To represent a function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation. Particularly, the majority of current protein function prediction methods rely on GO terms. However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation. In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model. GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions. Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.


Assuntos
Proteínas , Software , Humanos , Ontologia Genética , Proteínas/genética
9.
bioRxiv ; 2024 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-38464318

RESUMO

Structure-based virtual screening (SBVS) is a widely used method in silico drug discovery. It necessitates a receptor structure or binding site to predict the binding pose and fitness of a ligand. Therefore, the performance of the SBVS is affected by the protein conformation. The most frequently used method in SBVS is the protein-ligand docking program, which utilizes atomic distance-based scoring functions. Hence, they are highly prone to sensitivity towards variation in receptor structure, and it is reported that the conformational change significantly drops the performance of the docking program. To address the problem, we have introduced a novel program of SBVS, named PL-PatchSurfer. This program makes use of molecular surface patches and the Zernike descriptor. The surfaces of the pocket and ligand are segmented into several patches by the program. These patches are then mapped with physico-chemical properties such as shape and electrostatic potential before being converted into the Zernike descriptor, which is rotationally invariant. A complementarity between the protein and the ligand is assessed by comparing the descriptors and geometric distribution of the patches in the molecules. A benchmarking study showed that PL-PatchSurfer2 was able to screen active molecules regardless of the receptor structure change with fast speed. However, the program could not achieve high performance for the targets that the hydrogen bonding feature is important such as nuclear hormone receptors. In this paper, we present the newer version of PL-PatchSurfer, PL-PatchSurfer3, which incorporates two new features: a change in the definition of hydrogen bond complementarity and consideration of visibility that contains curvature information of a patch. Our evaluation demonstrates that the new program outperforms its predecessor and other SBVS methods while retaining its characteristic tolerance to receptor structure changes. Interested individuals can access the program at kiharalab.org/plps3.

10.
Mol Biol Evol ; 41(3)2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38376487

RESUMO

The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.


Assuntos
Balaenoptera , Neoplasias , Animais , Balaenoptera/genética , Duplicações Segmentares Genômicas , Genoma , Demografia , Neoplasias/genética
11.
bioRxiv ; 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38328203

RESUMO

Cryogenic electron microscopy (cryo-EM) has now been widely used for determining multi-chain protein complexes. However, modeling a complex structure is challenging particularly when the map resolution is low, typically in the intermediate resolution range of 5 to 10 Å. Within this resolution range, even accurate structure fitting is difficult, let alone de novo modeling. To address this challenge, here we present DiffModeler, a fully automated method for modeling protein complex structures. DiffModeler employs a diffusion model for backbone tracing and integrates AlphaFold2-predicted single-chain structures for structure fitting. Extensive testing on cryo-EM maps at intermediate resolutions demonstrates the exceptional accuracy of DiffModeler in structure modeling, achieving an average TM-Score of 0.92, surpassing existing methodologies significantly. Notably, DiffModeler successfully modeled a protein complex composed of 47 chains and 13,462 residues, achieving a high TM-Score of 0.94. Further benchmarking at low resolutions (10-20 Å confirms its versatility, demonstrating plausible performance. Moreover, when coupled with CryoREAD, DiffModeler excels in constructing protein-DNA/RNA complex structures for near-atomic resolution maps (0-5 Å), showcasing state-of-the-art performance with average TM-Scores of 0.88 and 0.91 across two datasets.

12.
Sci Data ; 11(1): 176, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38326333

RESUMO

Suncus etruscus is one of the world's smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew's small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.


Assuntos
Cromossomos , Musaranhos , Animais , Camundongos , Cromossomos/genética , Genoma , Genômica , Anotação de Sequência Molecular , Musaranhos/genética
13.
Res Sq ; 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38343795

RESUMO

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein/nucleic-acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: E. coli beta-galactosidase with inhibitor, SARS-CoV-2 RNA-dependent RNA polymerase with covalently bound nucleotide analog, and SARS-CoV-2 ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. We found that (1) the quality of submitted ligand models and surrounding atoms varied, as judged by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics, and contact scores, and (2) a composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.

14.
J Mol Biol ; 436(6): 168486, 2024 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-38336197

RESUMO

Membrane proteins play crucial roles in various cellular processes, and their interactions with other proteins in and on the membrane are essential for their proper functioning. While an increasing number of structures of more membrane proteins are being determined, the available structure data is still sparse. To gain insights into the mechanisms of membrane protein complexes, computational docking methods are necessary due to the challenge of experimental determination. Here, we introduce Mem-LZerD, a rigid-body membrane docking algorithm designed to take advantage of modern membrane modeling and protein docking techniques to facilitate the docking of membrane protein complexes. Mem-LZerD is based on the LZerD protein docking algorithm, which has been constantly among the top servers in many rounds of CAPRI protein docking assessment. By employing a combination of geometric hashing, newly constrained by the predicted membrane height and tilt angle, and model scoring accounting for the energy of membrane insertion, we demonstrate the capability of Mem-LZerD to model diverse membrane protein-protein complexes. Mem-LZerD successfully performed unbound docking on 13 of 21 (61.9%) transmembrane complexes in an established benchmark, more than shown by previous approaches. It was additionally tested on new datasets of 44 transmembrane complexes and 92 peripheral membrane protein complexes, of which it successfully modeled 35 (79.5%) and 15 (16.3%) complexes respectively. When non-blind orientations of peripheral targets were included, the number of successes increased to 54 (58.7%). We further demonstrate that Mem-LZerD produces complex models which are suitable for molecular dynamics simulation. Mem-LZerD is made available at https://lzerd.kiharalab.org.


Assuntos
Proteínas de Membrana , Algoritmos , Proteínas de Membrana/química , Simulação de Acoplamento Molecular , Ligação Proteica , Conformação Proteica , Software
15.
Nat Methods ; 21(1): 122-131, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38066344

RESUMO

Three-dimensional structure modeling from maps is an indispensable step for studying proteins and their complexes with cryogenic electron microscopy. Although the resolution of determined cryogenic electron microscopy maps has generally improved, there are still many cases where tracing protein main chains is difficult, even in maps determined at a near-atomic resolution. Here we developed a protein structure modeling method, DeepMainmast, which employs deep learning to capture the local map features of amino acids and atoms to assist main-chain tracing. Moreover, we integrated AlphaFold2 with the de novo density tracing protocol to combine their complementary strengths and achieved even higher accuracy than each method alone. Additionally, the protocol is able to accurately assign the chain identity to the structure models of homo-multimers, which is not a trivial task for existing methods.


Assuntos
Aprendizado Profundo , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Proteínas/química , Microscopia Eletrônica , Conformação Proteica
16.
bioRxiv ; 2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38106200

RESUMO

The three-dimensional structure of a protein plays a fundamental role in determining its function and has an essential impact on understanding biological processes. Despite significant progress in protein structure prediction, such as AlphaFold2, challenges remain on those hard targets that Alphafold2 does not often perform well due to the complex folding of protein and a large number of possible conformations. Here we present a modified version of the AlphaFold2, called Distance-AF, which aims to improve the performance of AlphaFold2 by including distance constraints as input information. Distance-AF uses AlphaFold2's predicted structure as a starting point and incorporates distance constraints between amino acids to adjust folding of the protein structure until it meets the constraints. Distance-AF can correct the domain orientation on challenging targets, leading to more accurate structures with a lower root mean square deviation (RMSD). The ability of Distance-AF is also useful in fitting protein structures into cryo-electron microscopy maps.

17.
bioRxiv ; 2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38106114

RESUMO

Protein-peptide interactions play a key role in biological processes. Understanding the interactions that occur within a receptor-peptide complex can help in discovering and altering their biological functions. Various computational methods for modeling the structures of receptor-peptide complexes have been developed. Recently, accurate structure prediction enabled by deep learning methods has significantly advanced the field of structural biology. AlphaFold (AF) is among the top-performing structure prediction methods and has highly accurate structure modeling performance on single-chain targets. Shortly after the release of AlphaFold, AlphaFold-Multimer (AFM) was developed in a similar fashion as AF for prediction of protein complex structures. AFM has achieved competitive performance in modeling protein-peptide interactions compared to previous computational methods; however, still further improvement is needed. Here, we present DistPepFold, which improves protein-peptide complex docking using an AFM-based architecture through a privileged knowledge distillation approach. DistPepFold leverages a teacher model that uses native interaction information during training and transfers its knowledge to a student model through a teacher-student distillation process. We evaluated DistPepFold's docking performance on two protein-peptide complex datasets and showed that DistPepFold outperforms AFM. Furthermore, we demonstrate that the student model was able to learn from the teacher model to make structural improvements based on AFM predictions.

18.
Commun Biol ; 6(1): 1103, 2023 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-37907681

RESUMO

Domains are functional and structural units of proteins that govern various biological functions performed by the proteins. Therefore, the characterization of domains in a protein can serve as a proper functional representation of proteins. Here, we employ a self-supervised protocol to derive functionally consistent representations for domains by learning domain-Gene Ontology (GO) co-occurrences and associations. The domain embeddings we constructed turned out to be effective in performing actual function prediction tasks. Extensive evaluations showed that protein representations using the domain embeddings are superior to those of large-scale protein language models in GO prediction tasks. Moreover, the new function prediction method built on the domain embeddings, named Domain-PFP, substantially outperformed the state-of-the-art function predictors. Additionally, Domain-PFP demonstrated competitive performance in the CAFA3 evaluation, achieving overall the best performance among the top teams that participated in the assessment.


Assuntos
Idioma , Proteínas , Ontologia Genética , Aprendizagem
19.
bioRxiv ; 2023 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-37961264

RESUMO

Membrane proteins play crucial roles in various cellular processes, and their interactions with other proteins in and on the membrane are essential for their proper functioning. While an increasing number of structures of more membrane proteins are being determined, the available structure data is still sparse. To gain insights into the mechanisms of membrane protein complexes, computational docking methods are necessary due to the challenge of experimental determination. Here, we introduce Mem-LZerD, a rigid-body membrane docking algorithm designed to take advantage of modern membrane modeling and protein docking techniques to facilitate the docking of membrane protein complexes. Mem-LZerD is based on the LZerD protein docking algorithm, which has been constantly among the top servers in many rounds of CAPRI protein docking assessment. By employing a combination of geometric hashing, newly constrained by the predicted membrane height and tilt angle, and model scoring accounting for the energy of membrane insertion, we demonstrate the capability of Mem-LZerD to model diverse membrane protein-protein complexes. Mem-LZerD successfully performed unbound docking on 13 of 21 (61.9%) transmembrane complexes in an established benchmark, more than shown by previous approaches. It was additionally tested on new datasets of 44 transmembrane complexes and 92 peripheral membrane protein complexes, of which it successfully modeled 35 (79.5%) and 15 (16.3%) complexes respectively. When non-blind orientations of peripheral targets were included, the number of successes increased to 54 (58.7%). We further demonstrate that Mem-LZerD produces complex models which are suitable for molecular dynamics simulation. Mem-LZerD is made available at https://lzerd.kiharalab.org.

20.
bioRxiv ; 2023 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-38014080

RESUMO

Understanding the biological functions of proteins is of fundamental importance in modern biology. To represent function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation. Particularly, the majority of current protein function prediction methods rely on GO terms. However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation. In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model. GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions. Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA