Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 215
Filtrar
1.
Sci Adv ; 10(36): eadq0350, 2024 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-39241064

RESUMEN

RNA polymerase II relies on a repetitive sequence domain (YSPTSPS) within its largest subunit to orchestrate transcription. While phosphorylation on serine-2/serine-5 of the carboxyl-terminal heptad repeats is well established, threonine-4's role remains enigmatic. Paradoxically, threonine-4 phosphorylation was only detected after transcription end sites despite functionally implicated in pausing, elongation, termination, and messenger RNA processing. Our investigation revealed that threonine-4 phosphorylation detection was obstructed by flanking serine-5 phosphorylation at the onset of transcription, which can be removed selectively. Subsequent proteomic analyses identified many proteins recruited to transcription via threonine-4 phosphorylation, which previously were attributed to serine-2. Loss of threonine-4 phosphorylation greatly reduces serine-2 phosphorylation, revealing a cross-talk between the two marks. Last, the function analysis of the threonine-4 phosphorylation highlighted its role in alternative 3'-end processing within pro-proliferative genes. Our findings unveil the true genomic location of this evolutionarily conserved phosphorylation mark and prompt a reassessment of functional assignments of the carboxyl-terminal domain.


Asunto(s)
ARN Polimerasa II , Treonina , Transcripción Genética , Fosforilación , ARN Polimerasa II/metabolismo , ARN Polimerasa II/genética , Treonina/metabolismo , Humanos , Procesamiento de Término de ARN 3' , Serina/metabolismo , Proteómica/métodos
2.
PLoS Comput Biol ; 20(7): e1012258, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38968291

RESUMEN

The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell's method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell's method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.


Asunto(s)
Algoritmos , Cadenas de Markov , Análisis de Secuencia de Proteína , Análisis de Secuencia de Proteína/métodos , Proteínas/química , Biología Computacional/métodos , Imagen Individual de Molécula/métodos , Simulación por Computador
3.
bioRxiv ; 2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38853926

RESUMEN

All eukaryotes share a common ancestor from roughly 1.5 - 1.8 billion years ago, a single-celled, swimming microbe known as LECA, the Last Eukaryotic Common Ancestor. Nearly half of the genes in modern eukaryotes were present in LECA, and many current genetic diseases and traits stem from these ancient molecular systems. To better understand these systems, we compared genes across modern organisms and identified a core set of 10,092 shared protein-coding gene families likely present in LECA, a quarter of which are uncharacterized. We then integrated >26,000 mass spectrometry proteomics analyses from 31 species to infer how these proteins interact in higher-order complexes. The resulting interactome describes the biochemical organization of LECA, revealing both known and new assemblies. We analyzed these ancient protein interactions to find new human gene-disease relationships for bone density and congenital birth defects, demonstrating the value of ancestral protein interactions for guiding functional genetics today.

4.
Mol Syst Biol ; 20(8): 933-951, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38918600

RESUMEN

The variability of proteins at the sequence level creates an enormous potential for proteome complexity. Exploring the depths and limits of this complexity is an ongoing goal in biology. Here, we systematically survey human and plant high-throughput bottom-up native proteomics data for protein truncation variants, where substantial regions of the full-length protein are missing from an observed protein product. In humans, Arabidopsis, and the green alga Chlamydomonas, approximately one percent of observed proteins show a short form, which we can assign by comparison to RNA isoforms as either likely deriving from transcript-directed processes or limited proteolysis. While some detected protein fragments align with known splice forms and protein cleavage events, multiple examples are previously undescribed, such as our observation of fibrocystin proteolysis and nuclear translocation in a green alga. We find that truncations occur almost entirely between structured protein domains, even when short forms are derived from transcript variants. Intriguingly, multiple endogenous protein truncations of phase-separating translational proteins resemble cleaved proteoforms produced by enteroviruses during infection. Some truncated proteins are also observed in both humans and plants, suggesting that they date to the last eukaryotic common ancestor. Finally, we describe novel proteoform-specific protein complexes, where the loss of a domain may accompany complex formation.


Asunto(s)
Arabidopsis , Proteómica , Arabidopsis/genética , Arabidopsis/metabolismo , Humanos , Proteómica/métodos , Chlamydomonas/metabolismo , Chlamydomonas/genética , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteoma/genética , Proteolisis , Proteínas de Plantas/metabolismo , Proteínas de Plantas/genética , Empalme Alternativo
5.
Digit Discov ; 3(6): 1150-1159, 2024 Jun 12.
Artículo en Inglés | MEDLINE | ID: mdl-38873033

RESUMEN

The fundamental goal of small molecule discovery is to generate chemicals with target functionality. While this often proceeds through structure-based methods, we set out to investigate the practicality of methods that leverage the extensive corpus of chemical literature. We hypothesize that a sufficiently large text-derived chemical function dataset would mirror the actual landscape of chemical functionality. Such a landscape would implicitly capture complex physical and biological interactions given that chemical function arises from both a molecule's structure and its interacting partners. To evaluate this hypothesis, we built a Chemical Function (CheF) dataset of patent-derived functional labels. This dataset, comprising 631 K molecule-function pairs, was created using an LLM- and embedding-based method to obtain 1.5 K unique functional labels for approximately 100 K randomly selected molecules from their corresponding 188 K unique patents. We carry out a series of analyses demonstrating that the CheF dataset contains a semantically coherent textual representation of the functional landscape congruent with chemical structural relationships, thus approximating the actual chemical function landscape. We then demonstrate through several examples that this text-based functional landscape can be leveraged to identify drugs with target functionality using a model able to predict functional profiles from structure alone. We believe that functional label-guided molecular discovery may serve as an alternative approach to traditional structure-based methods in the pursuit of designing novel functional molecules.

6.
Int J Mol Sci ; 25(11)2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38892247

RESUMEN

Yeast expression of human G-protein-coupled receptors (GPCRs) can be used as a biosensor platform for the detection of pharmaceuticals. Cannabinoid receptor type 1 (CB1R) is of particular interest, given the cornucopia of natural and synthetic cannabinoids being explored as therapeutics. We show for the first time that engineering the N-terminus of CB1R allows for efficient signal transduction in yeast, and that engineering the sterol composition of the yeast membrane modulates its performance. Using an engineered cannabinoid biosensor, we demonstrate that large libraries of synthetic cannabinoids and terpenes can be quickly screened to elucidate known and novel structure-activity relationships. The biosensor strains offer a ready platform for evaluating the activity of new synthetic cannabinoids, monitoring drugs of abuse, and developing therapeutic molecules.


Asunto(s)
Técnicas Biosensibles , Cannabinoides , Receptor Cannabinoide CB1 , Saccharomyces cerevisiae , Técnicas Biosensibles/métodos , Humanos , Cannabinoides/química , Cannabinoides/farmacología , Cannabinoides/metabolismo , Receptor Cannabinoide CB1/metabolismo , Receptor Cannabinoide CB1/genética , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Relación Estructura-Actividad , Transducción de Señal/efectos de los fármacos
7.
Genome Res ; 34(3): 484-497, 2024 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-38580401

RESUMEN

Transcriptional regulation controls cellular functions through interactions between transcription factors (TFs) and their chromosomal targets. However, understanding the fate conversion potential of multiple TFs in an inducible manner remains limited. Here, we introduce iTF-seq as a method for identifying individual TFs that can alter cell fate toward specific lineages at a single-cell level. iTF-seq enables time course monitoring of transcriptome changes, and with biotinylated individual TFs, it provides a multi-omics approach to understanding the mechanisms behind TF-mediated cell fate changes. Our iTF-seq study in mouse embryonic stem cells identified multiple TFs that trigger rapid transcriptome changes indicative of differentiation within a day of induction. Moreover, cells expressing these potent TFs often show a slower cell cycle and increased cell death. Further analysis using bioChIP-seq revealed that GCM1 and OTX2 act as pioneer factors and activators by increasing gene accessibility and activating the expression of lineage specification genes during cell fate conversion. iTF-seq has utility in both mapping cell fate conversion and understanding cell fate conversion mechanisms.


Asunto(s)
Diferenciación Celular , Factores de Transcripción , Animales , Ratones , Diferenciación Celular/genética , Linaje de la Célula/genética , Perfilación de la Expresión Génica/métodos , Células Madre Embrionarias de Ratones/metabolismo , Células Madre Embrionarias de Ratones/citología , Multiómica , ARN Citoplasmático Pequeño/genética , ARN Citoplasmático Pequeño/metabolismo , RNA-Seq/métodos , Análisis de Secuencia de ARN/métodos , Análisis de Expresión Génica de una Sola Célula , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Transcriptoma
8.
bioRxiv ; 2024 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-38370702

RESUMEN

Finding the 3D structure of proteins and their complexes has several applications, such as developing vaccines that target viral proteins effectively. Methods such as cryogenic electron microscopy (cryo-EM) have improved in their ability to capture high-resolution images, and when applied to a purified sample containing copies of a macromolecule, they can be used to produce a high-quality snapshot of different 2D orientations of the macromolecule, which can be combined to reconstruct its 3D structure. Instead of purifying a sample so that it contains only one macromolecule, a process that can be difficult, time-consuming, and expensive, a cell sample containing multiple particles can be photographed directly and separated into its constituent particles using computational methods. Previous work, SLICEM, has separated 2D projection images of different particles into their respective groups using 2 methods, clustering a graph with edges weighted by pairwise similarities of common lines of the 2D projections. In this work, we develop DeepSLICEM, a pipeline that clusters rich representations of 2D projections, obtained by combining graphical features from a similarity graph based on common lines, with additional image features extracted from a convolutional neural network. DeepSLICEM explores 6 pretrained convolutional neural networks and one supervised Siamese CNN for image representation, 10 pretrained deep graph neural networks for similarity graph node representations, and 4 methods for clustering, along with 8 methods for directly clustering the similarity graph. On 6 synthetic and experimental datasets, the DeepSLICEM pipeline finds 92 method combinations achieving better clustering accuracy than previous methods from SLICEM. Thus, in this paper, we demonstrate that deep neural networks have great potential for accurately separating mixtures of 2D projections of different macromolecules computationally.

9.
Mol Biol Cell ; 35(3): ar39, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38170584

RESUMEN

DIFFRAC is a powerful method for systematically comparing proteome content and organization between samples in a high-throughput manner. By subjecting control and experimental protein extracts to native chromatography and quantifying the contents of each fraction using mass spectrometry, it enables the quantitative detection of alterations to protein complexes and abundances. Here, we applied DIFFRAC to investigate the consequences of genetic loss of Ift122, a subunit of the intraflagellar transport-A (IFT-A) protein complex that plays a vital role in the formation and function of cilia and flagella, on the proteome of Tetrahymena thermophila. A single DIFFRAC experiment was sufficient to detect changes in protein behavior that mirrored known effects of IFT-A loss and revealed new biology. We uncovered several novel IFT-A-regulated proteins, which we validated through live imaging in Xenopus multiciliated cells, shedding new light on both the ciliary and non-ciliary functions of IFT-A. Our findings underscore the robustness of DIFFRAC for revealing proteomic changes in response to genetic or biochemical perturbation.


Asunto(s)
Proteoma , Proteómica , Transporte de Proteínas/fisiología , Proteoma/metabolismo , Transporte Biológico/fisiología , Cilios/metabolismo , Flagelos/metabolismo , Fenotipo
10.
G3 (Bethesda) ; 14(3)2024 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-38135291

RESUMEN

Studying the genetic and molecular characteristics of brewing yeast strains is crucial for understanding their domestication history and adaptations accumulated over time in fermentation environments, and for guiding optimizations to the brewing process itself. Saccharomyces cerevisiae (brewing yeast) is among the most profiled organisms on the planet, yet the temporal molecular changes that underlie industrial fermentation and beer brewing remain understudied. Here, we characterized the genomic makeup of a Saccharomyces cerevisiae ale yeast widely used in the production of Hefeweizen beers, and applied shotgun mass spectrometry to systematically measure the proteomic changes throughout 2 fermentation cycles which were separated by 14 rounds of serial repitching. The resulting brewing yeast proteomics resource includes 64,740 protein abundance measurements. We found that this strain possesses typical genetic characteristics of Saccharomyces cerevisiae ale strains and displayed progressive shifts in molecular processes during fermentation based on protein abundance changes. We observed protein abundance differences between early fermentation batches compared to those separated by 14 rounds of serial repitching. The observed abundance differences occurred mainly in proteins involved in the metabolism of ergosterol and isobutyraldehyde. Our systematic profiling serves as a starting point for deeper characterization of how the yeast proteome changes during commercial fermentations and additionally serves as a resource to guide fermentation protocols, strain handling, and engineering practices in commercial brewing and fermentation environments. Finally, we created a web interface (https://brewing-yeast-proteomics.ccbb.utexas.edu/) to serve as a valuable resource for yeast geneticists, brewers, and biochemists to provide insights into the global trends underlying commercial beer production.


Asunto(s)
Proteómica , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Fermentación , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Cerveza/análisis
11.
Commun Biol ; 6(1): 1250, 2023 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-38082099

RESUMEN

The ongoing evolution of SARS-CoV-2 into more easily transmissible and infectious variants has provided unprecedented insight into mutations enabling immune escape. Understanding how these mutations affect the dynamics of antibody-antigen interactions is crucial to the development of broadly protective antibodies and vaccines. Here we report the characterization of a potent neutralizing antibody (N3-1) identified from a COVID-19 patient during the first disease wave. Cryogenic electron microscopy revealed a quaternary binding mode that enables direct interactions with all three receptor-binding domains of the spike protein trimer, resulting in extraordinary avidity and potent neutralization of all major variants of concern until the emergence of Omicron. Structure-based rational design of N3-1 mutants improved binding to all Omicron variants but only partially restored neutralization of the conformationally distinct Omicron BA.1. This study provides new insights into immune evasion through changes in spike protein dynamics and highlights considerations for future conformationally biased multivalent vaccine designs.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/genética , Anticuerpos Neutralizantes
12.
bioRxiv ; 2023 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-37790497

RESUMEN

Studying the genetic and molecular characteristics of brewing yeast strains is crucial for understanding their domestication history and adaptations accumulated over time in fermentation environments, and for guiding optimizations to the brewing process itself. Saccharomyces cerevisiae (brewing yeast) is amongst the most profiled organisms on the planet, yet the temporal molecular changes that underlie industrial fermentation and beer brewing remain understudied. Here, we characterized the genomic makeup of a Saccharomyces cerevisiae ale yeast widely used in the production of Hefeweizen beers, and applied shotgun mass spectrometry to systematically measure the proteomic changes throughout two fermentation cycles which were separated by 14 rounds of serial repitching. The resulting brewing yeast proteomics resource includes 64,740 protein abundance measurements. We found that this strain possesses typical genetic characteristics of Saccharomyces cerevisiae ale strains and displayed progressive shifts in molecular processes during fermentation based on protein abundance changes. We observed protein abundance differences between early fermentation batches compared to those separated by 14 rounds of serial repitching. The observed abundance differences occurred mainly in proteins involved in the metabolism of ergosterol and isobutyraldehyde. Our systematic profiling serves as a starting point for deeper characterization of how the yeast proteome changes during commercial fermentations and additionally serves as a resource to guide fermentation protocols, strain handling, and engineering practices in commercial brewing and fermentation environments. Finally, we created a web interface (https://brewing-yeast-proteomics.ccbb.utexas.edu/) to serve as a valuable resource for yeast geneticists, brewers, and biochemists to provide insights into the global trends underlying commercial beer production.

13.
bioRxiv ; 2023 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-37781579

RESUMEN

Motile cilia are ancient, evolutionarily conserved organelles whose dysfunction underlies motile ciliopathies, a broad class of human diseases. Motile cilia contain myriad different proteins that assemble into an array of distinct machines, so understanding the interactions and functional hierarchies among them presents an important challenge. Here, we defined the protein interactome of motile axonemes using cross-linking mass spectrometry (XL/MS) in Tetrahymena thermophila. From over 19,000 XLs, we identified 4,757 unique amino acid interactions among 1,143 distinct proteins, providing both macromolecular and atomic-scale insights into diverse ciliary machines, including the Intraflagellar Transport system, axonemal dynein arms, radial spokes, the 96 nm ruler, and microtubule inner proteins, among others. Guided by this dataset, we used vertebrate multiciliated cells to reveal novel functional interactions among several poorly-defined human ciliopathy proteins. The dataset therefore provides a powerful resource for studying the basic biology of an ancient organelle and the molecular etiology of human genetic disease.

14.
Front Plant Sci ; 14: 1252564, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37780492

RESUMEN

Hybrid vigor or heterosis has been widely applied in agriculture and extensively studied using genetic and gene expression approaches. However, the biochemical mechanism underlying heterosis remains elusive. One theory suggests that a decrease in protein aggregation may occur in hybrids due to the presence of protein variants between parental alleles, but it has not been experimentally tested. Here, we report comparative analysis of soluble and insoluble proteomes in Arabidopsis intraspecific and interspecific hybrids or allotetraploids formed between A. thaliana and A. arenosa. Both allotetraploids and intraspecific hybrids displayed nonadditive expression (unequal to the sum of the two parents) of the proteins, most of which were involved in biotic and abiotic stress responses. In the allotetraploids, homoeolog-expression bias was not observed among all proteins examined but accounted for 17-20% of the nonadditively expressed proteins, consistent with the transcriptome results. Among expression-biased homoeologs, there were more A. thaliana-biased than A. arenosa-biased homoeologs. Analysis of the insoluble and soluble proteomes revealed more soluble proteins in the hybrids than their parents but not in the allotetraploids. Most proteins in ribosomal biosynthesis and in the thylakoid lumen, membrane, and stroma were in the soluble fractions, indicating a role of protein stability in photosynthetic activities for promoting growth. Thus, nonadditive expression of stress-responsive proteins and increased solubility of photosynthetic proteins may contribute to heterosis in Arabidopsis hybrids and allotetraploids and possibly hybrid crops.

15.
iScience ; 26(9): 107581, 2023 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-37664589

RESUMEN

During eukaryotic transcription, RNA polymerase II undergoes dynamic post-translational modifications on the C-terminal domain (CTD) of the largest subunit, generating an information-rich PTM landscape that transcriptional regulators bind. The phosphorylation of Ser5 and Ser2 of CTD heptad occurs spatiotemporally with the transcriptional stages, recruiting different transcriptional regulators to Pol II. To delineate the protein interactomes at different transcriptional stages, we reconstructed phosphorylation patterns of the CTD at Ser5 and Ser2 in vitro. Our results showed that distinct protein interactomes are recruited to RNA polymerase II at different stages of transcription by the phosphorylation of Ser2 and Ser5 of the CTD heptads. In particular, we characterized calcium homeostasis endoplasmic reticulum protein (CHERP) as a regulator bound by phospho-Ser2 heptad. Pol II association with CHERP recruits an accessory splicing complex whose loss results in broad changes in alternative splicing events. Our results shed light on the PTM-coded recruitment process that coordinates transcription.

16.
Nat Commun ; 14(1): 5741, 2023 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-37714832

RESUMEN

Cilia are hairlike protrusions that project from the surface of eukaryotic cells and play key roles in cell signaling and motility. Ciliary motility is regulated by the conserved nexin-dynein regulatory complex (N-DRC), which links adjacent doublet microtubules and regulates and coordinates the activity of outer doublet complexes. Despite its critical role in cilia motility, the assembly and molecular basis of the regulatory mechanism are poorly understood. Here, using cryo-electron microscopy in conjunction with biochemical cross-linking and integrative modeling, we localize 12 DRC subunits in the N-DRC structure of Tetrahymena thermophila. We also find that the CCDC96/113 complex is in close contact with the DRC9/10 in the linker region. In addition, we reveal that the N-DRC is associated with a network of coiled-coil proteins that most likely mediates N-DRC regulatory activity.


Asunto(s)
Dineínas , Proteínas Asociadas a Microtúbulos , Microscopía por Crioelectrón , Citoesqueleto , Axonema , Proteínas Amiloidogénicas
17.
bioRxiv ; 2023 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-37745461

RESUMEN

The need to accurately survey proteins and their modifications with ever higher sensitivities, particularly in clinical settings with limited samples, is spurring development of new single molecule proteomics technologies. Fluorosequencing is one such highly parallelized single molecule peptide sequencing platform, based on determining the sequence positions of select amino acid types within peptides to enable their identification and quantification from a reference database. Here, we describe substantial improvements to fluorosequencing, including identifying fluorophores compatible with the sequencing chemistry, mitigating dye-dye interactions through the use of extended polyproline linkers, and developing an end-to-end workflow for sample preparation and sequencing. We demonstrate by fluorosequencing peptides in mixtures and identifying a target neoantigen from a database of decoy MHC peptides, highlighting the potential of the technology for high sensitivity clinical applications.

18.
BMC Bioinformatics ; 24(1): 306, 2023 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-37532987

RESUMEN

BACKGROUND: Proteins often assemble into higher-order complexes to perform their biological functions. Such protein-protein interactions (PPI) are often experimentally measured for pairs of proteins and summarized in a weighted PPI network, to which community detection algorithms can be applied to define the various higher-order protein complexes. Current methods include unsupervised and supervised approaches, often assuming that protein complexes manifest only as dense subgraphs. Utilizing supervised approaches, the focus is not on how to find them in a network, but only on learning which subgraphs correspond to complexes, currently solved using heuristics. However, learning to walk trajectories on a network to identify protein complexes leads naturally to a reinforcement learning (RL) approach, a strategy not extensively explored for community detection. Here, we develop and evaluate a reinforcement learning pipeline for community detection on weighted protein-protein interaction networks to detect new protein complexes. The algorithm is trained to calculate the value of different subgraphs encountered while walking on the network to reconstruct known complexes. A distributed prediction algorithm then scales the RL pipeline to search for novel protein complexes on large PPI networks. RESULTS: The reinforcement learning pipeline is applied to a human PPI network consisting of 8k proteins and 60k PPI, which results in 1,157 protein complexes. The method demonstrated competitive accuracy with improved speed compared to previous algorithms. We highlight protein complexes such as C4orf19, C18orf21, and KIAA1522 which are currently minimally characterized. Additionally, the results suggest TMC04 be a putative additional subunit of the KICSTOR complex and confirm the involvement of C15orf41 in a higher-order complex with HIRA, CDAN1, ASF1A, and by 3D structural modeling. CONCLUSIONS: Reinforcement learning offers several distinct advantages for community detection, including scalability and knowledge of the walk trajectories defining those communities. Applied to currently available human protein interaction networks, this method had comparable accuracy with other algorithms and notable savings in computational time, and in turn, led to clear predictions of protein function and interactions for several uncharacterized human proteins.


Asunto(s)
Algoritmos , Mapas de Interacción de Proteínas , Humanos , Factores de Transcripción , Mapeo de Interacción de Proteínas/métodos , Biología Computacional/métodos , Glicoproteínas , Proteínas Nucleares , Proteínas de Ciclo Celular , Chaperonas Moleculares
19.
bioRxiv ; 2023 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-37502879

RESUMEN

The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently, which should help prevent overfitting. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell's method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell's method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.

20.
bioRxiv ; 2023 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-37398254

RESUMEN

Cilia are hairlike protrusions that project from the surface of eukaryotic cells and play key roles in cell signaling and motility. Ciliary motility is regulated by the conserved nexin-dynein regulatory complex (N-DRC), which links adjacent doublet microtubules and regulates and coordinates the activity of outer doublet complexes. Despite its critical role in cilia motility, the assembly and molecular basis of the regulatory mechanism are poorly understood. Here, utilizing cryo-electron microscopy in conjunction with biochemical cross-linking and integrative modeling, we localized 12 DRC subunits in the N-DRC structure of Tetrahymena thermophila . We also found that the CCDC96/113 complex is in close contact with the N-DRC. In addition, we revealed that the N-DRC is associated with a network of coiled-coil proteins that most likely mediates N-DRC regulatory activity.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA