Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 10.534
Filtrar
Más filtros

Intervalo de año de publicación
1.
Cell ; 186(10): 2256-2272.e23, 2023 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-37119812

RESUMEN

Applications of prime editing are often limited due to insufficient efficiencies, and it can require substantial time and resources to determine the most efficient pegRNAs and prime editors (PEs) to generate a desired edit under various experimental conditions. Here, we evaluated prime editing efficiencies for a total of 338,996 pairs of pegRNAs including 3,979 epegRNAs and target sequences in an error-free manner. These datasets enabled a systematic determination of factors affecting prime editing efficiencies. Then, we developed computational models, named DeepPrime and DeepPrime-FT, that can predict prime editing efficiencies for eight prime editing systems in seven cell types for all possible types of editing of up to 3 base pairs. We also extensively profiled the prime editing efficiencies at mismatched targets and developed a computational model predicting editing efficiencies at such targets. These computational models, together with our improved knowledge about prime editing efficiency determinants, will greatly facilitate prime editing applications.


Asunto(s)
Simulación por Computador , Edición Génica , ARN Guía de Sistemas CRISPR-Cas , Sistemas CRISPR-Cas , Edición Génica/métodos , Conocimiento , ARN Guía de Sistemas CRISPR-Cas/química , Especificidad de Órganos , Conjuntos de Datos como Asunto
2.
Cell ; 186(2): 363-381.e19, 2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36669472

RESUMEN

Advanced solid cancers are complex assemblies of tumor, immune, and stromal cells characterized by high intratumoral variation. We use highly multiplexed tissue imaging, 3D reconstruction, spatial statistics, and machine learning to identify cell types and states underlying morphological features of known diagnostic and prognostic significance in colorectal cancer. Quantitation of these features in high-plex marker space reveals recurrent transitions from one tumor morphology to the next, some of which are coincident with long-range gradients in the expression of oncogenes and epigenetic regulators. At the tumor invasive margin, where tumor, normal, and immune cells compete, T cell suppression involves multiple cell types and 3D imaging shows that seemingly localized 2D features such as tertiary lymphoid structures are commonly interconnected and have graded molecular properties. Thus, while cancer genetics emphasizes the importance of discrete changes in tumor state, whole-specimen imaging reveals large-scale morphological and molecular gradients analogous to those in developing tissues.


Asunto(s)
Adenocarcinoma , Neoplasias Colorrectales , Humanos , Adenocarcinoma/patología , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/inmunología , Neoplasias Colorrectales/patología , Procesamiento de Imagen Asistido por Computador , Oncogenes , Microambiente Tumoral
3.
J Cell Sci ; 137(20)2024 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-38738286

RESUMEN

Plant protoplasts provide starting material for of inducing pluripotent cell masses that are competent for tissue regeneration in vitro, analogous to animal induced pluripotent stem cells (iPSCs). Dedifferentiation is associated with large-scale chromatin reorganisation and massive transcriptome reprogramming, characterised by stochastic gene expression. How this cellular variability reflects on chromatin organisation in individual cells and what factors influence chromatin transitions during culturing are largely unknown. Here, we used high-throughput imaging and a custom supervised image analysis protocol extracting over 100 chromatin features of cultured protoplasts. The analysis revealed rapid, multiscale dynamics of chromatin patterns with a trajectory that strongly depended on nutrient availability. Decreased abundance in H1 (linker histones) is hallmark of chromatin transitions. We measured a high heterogeneity of chromatin patterns indicating intrinsic entropy as a hallmark of the initial cultures. We further measured an entropy decline over time, and an antagonistic influence by external and intrinsic factors, such as phytohormones and epigenetic modifiers, respectively. Collectively, our study benchmarks an approach to understand the variability and evolution of chromatin patterns underlying plant cell reprogramming in vitro.


Asunto(s)
Cromatina , Entropía , Células Madre Pluripotentes Inducidas , Cromatina/metabolismo , Cromatina/genética , Células Madre Pluripotentes Inducidas/metabolismo , Células Madre Pluripotentes Inducidas/citología , Protoplastos/metabolismo , Reprogramación Celular/genética , Histonas/metabolismo , Histonas/genética , Células Vegetales/metabolismo , Epigénesis Genética
4.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38935070

RESUMEN

Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Redes Neurales de la Computación , Humanos , Biología Computacional/métodos , Algoritmos , Neoplasias de la Vejiga Urinaria/genética , Neoplasias de la Vejiga Urinaria/patología , Escherichia coli/genética
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38711371

RESUMEN

T-cell receptor (TCR) recognition of antigens is fundamental to the adaptive immune response. With the expansion of experimental techniques, a substantial database of matched TCR-antigen pairs has emerged, presenting opportunities for computational prediction models. However, accurately forecasting the binding affinities of unseen antigen-TCR pairs remains a major challenge. Here, we present convolutional-self-attention TCR (CATCR), a novel framework tailored to enhance the prediction of epitope and TCR interactions. Our approach utilizes convolutional neural networks to extract peptide features from residue contact matrices, as generated by OpenFold, and a transformer to encode segment-based coded sequences. We introduce CATCR-D, a discriminator that can assess binding by analyzing the structural and sequence features of epitopes and CDR3-ß regions. Additionally, the framework comprises CATCR-G, a generative module designed for CDR3-ß sequences, which applies the pretrained encoder to deduce epitope characteristics and a transformer decoder for predicting matching CDR3-ß sequences. CATCR-D achieved an AUROC of 0.89 on previously unseen epitope-TCR pairs and outperformed four benchmark models by a margin of 17.4%. CATCR-G has demonstrated high precision, recall and F1 scores, surpassing 95% in bidirectional encoder representations from transformers score assessments. Our results indicate that CATCR is an effective tool for predicting unseen epitope-TCR interactions. Incorporating structural insights enhances our understanding of the general rules governing TCR-epitope recognition significantly. The ability to predict TCRs for novel epitopes using structural and sequence information is promising, and broadening the repository of experimental TCR-epitope data could further improve the precision of epitope-TCR binding predictions.


Asunto(s)
Receptores de Antígenos de Linfocitos T , Receptores de Antígenos de Linfocitos T/química , Receptores de Antígenos de Linfocitos T/inmunología , Receptores de Antígenos de Linfocitos T/metabolismo , Receptores de Antígenos de Linfocitos T/genética , Humanos , Epítopos/química , Epítopos/inmunología , Biología Computacional/métodos , Redes Neurales de la Computación , Epítopos de Linfocito T/inmunología , Epítopos de Linfocito T/química , Antígenos/química , Antígenos/inmunología , Secuencia de Aminoácidos
6.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-39038937

RESUMEN

Peptide drugs are becoming star drug agents with high efficiency and selectivity which open up new therapeutic avenues for various diseases. However, the sensitivity to hydrolase and the relatively short half-life have severely hindered their development. In this study, a new generation artificial intelligence-based system for accurate prediction of peptide half-life was proposed, which realized the half-life prediction of both natural and modified peptides and successfully bridged the evaluation possibility between two important species (human, mouse) and two organs (blood, intestine). To achieve this, enzymatic cleavage descriptors were integrated with traditional peptide descriptors to construct a better representation. Then, robust models with accurate performance were established by comparing traditional machine learning and transfer learning, systematically. Results indicated that enzymatic cleavage features could certainly enhance model performance. The deep learning model integrating transfer learning significantly improved predictive accuracy, achieving remarkable R2 values: 0.84 for natural peptides and 0.90 for modified peptides in human blood, 0.984 for natural peptides and 0.93 for modified peptides in mouse blood, and 0.94 for modified peptides in mouse intestine on the test set, respectively. These models not only successfully composed the above-mentioned system but also improved by approximately 15% in terms of correlation compared to related works. This study is expected to provide powerful solutions for peptide half-life evaluation and boost peptide drug development.


Asunto(s)
Péptidos , Animales , Semivida , Humanos , Ratones , Péptidos/metabolismo , Péptidos/química , Aprendizaje Profundo , Aprendizaje Automático
7.
EMBO Rep ; 25(5): 2306-2322, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38528170

RESUMEN

Plants rely on Nucleotide-binding, Leucine-rich repeat Receptors (NLRs) for pathogen recognition. Highly variable NLRs (hvNLRs) show remarkable intraspecies diversity, while their low-variability paralogs (non-hvNLRs) are conserved between ecotypes. At a population level, hvNLRs provide new pathogen-recognition specificities, but the association between allelic diversity and genomic and epigenomic features has not been established. Our investigation of NLRs in Arabidopsis Col-0 has revealed that hvNLRs show higher expression, less gene body cytosine methylation, and closer proximity to transposable elements than non-hvNLRs. hvNLRs show elevated synonymous and nonsynonymous nucleotide diversity and are in chromatin states associated with an increased probability of mutation. Diversifying selection maintains variability at a subset of codons of hvNLRs, while purifying selection maintains conservation at non-hvNLRs. How these features are established and maintained, and whether they contribute to the observed diversity of hvNLRs is key to understanding the evolution of plant innate immune receptors.


Asunto(s)
Alelos , Proteínas de Arabidopsis , Arabidopsis , Variación Genética , Proteínas NLR , Arabidopsis/genética , Proteínas NLR/genética , Proteínas NLR/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Genoma de Planta , Regulación de la Expresión Génica de las Plantas , Metilación de ADN/genética , Genómica/métodos , Evolución Molecular
8.
J Neurosci ; 44(4)2024 01 24.
Artículo en Inglés | MEDLINE | ID: mdl-38267235

RESUMEN

Low-level features are typically continuous (e.g., the gamut between two colors), but semantic information is often categorical (there is no corresponding gradient between dog and turtle) and hierarchical (animals live in land, water, or air). To determine the impact of these differences on cognitive representations, we characterized the geometry of perceptual spaces of five domains: a domain dominated by semantic information (animal names presented as words), a domain dominated by low-level features (colored textures), and three intermediate domains (animal images, lightly texturized animal images that were easy to recognize, and heavily texturized animal images that were difficult to recognize). Each domain had 37 stimuli derived from the same animal names. From 13 participants (9F), we gathered similarity judgments in each domain via an efficient psychophysical ranking paradigm. We then built geometric models of each domain for each participant, in which distances between stimuli accounted for participants' similarity judgments and intrinsic uncertainty. Remarkably, the five domains had similar global properties: each required 5-7 dimensions, and a modest amount of spherical curvature provided the best fit. However, the arrangement of the stimuli within these embeddings depended on the level of semantic information: dendrograms derived from semantic domains (word, image, and lightly texturized images) were more "tree-like" than those from feature-dominated domains (heavily texturized images and textures). Thus, the perceptual spaces of domains along this feature-dominated to semantic-dominated gradient shift to a tree-like organization when semantic information dominates, while retaining a similar global geometry.


Asunto(s)
Juicio , Tortugas , Humanos , Animales , Perros , Semántica , Incertidumbre , Agua
9.
J Neurosci ; 44(10)2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38267259

RESUMEN

Sound texture perception takes advantage of a hierarchy of time-averaged statistical features of acoustic stimuli, but much remains unclear about how these statistical features are processed along the auditory pathway. Here, we compared the neural representation of sound textures in the inferior colliculus (IC) and auditory cortex (AC) of anesthetized female rats. We recorded responses to texture morph stimuli that gradually add statistical features of increasingly higher complexity. For each texture, several different exemplars were synthesized using different random seeds. An analysis of transient and ongoing multiunit responses showed that the IC units were sensitive to every type of statistical feature, albeit to a varying extent. In contrast, only a small proportion of AC units were overtly sensitive to any statistical features. Differences in texture types explained more of the variance of IC neural responses than did differences in exemplars, indicating a degree of "texture type tuning" in the IC, but the same was, perhaps surprisingly, not the case for AC responses. We also evaluated the accuracy of texture type classification from single-trial population activity and found that IC responses became more informative as more summary statistics were included in the texture morphs, while for AC population responses, classification performance remained consistently very low. These results argue against the idea that AC neurons encode sound type via an overt sensitivity in neural firing rate to fine-grain spectral and temporal statistical features.


Asunto(s)
Corteza Auditiva , Colículos Inferiores , Femenino , Ratas , Animales , Vías Auditivas/fisiología , Colículos Inferiores/fisiología , Mesencéfalo/fisiología , Sonido , Corteza Auditiva/fisiología , Estimulación Acústica/métodos , Percepción Auditiva/fisiología
10.
Genet Epidemiol ; 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38686586

RESUMEN

Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline BRCA1/2 mutations and in head and neck cancer patients exposed to human papillomavirus.

11.
Mol Biol Evol ; 41(8)2024 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-39101471

RESUMEN

Several mammalian genes have originated from the domestication of retrotransposons, selfish mobile elements related to retroviruses. Some of the proteins encoded by these genes have maintained virus-like features; including self-processing, capsid structure formation, and the generation of different isoforms through -1 programmed ribosomal frameshifting. Using quantitative approaches in molecular evolution and biophysical analyses, we studied 28 retrotransposon-derived genes, with a focus on the evolution of virus-like features. By analyzing the rate of synonymous substitutions, we show that the -1 programmed ribosomal frameshifting mechanism in three of these genes (PEG10, PNMA3, and PNMA5) is conserved across mammals and originates alternative proteins. These genes were targets of positive selection in primates, and one of the positively selected sites affects a B-cell epitope on the spike domain of the PNMA5 capsid, a finding reminiscent of observations in infectious viruses. More generally, we found that retrotransposon-derived proteins vary in their intrinsically disordered region content and this is directly associated with their evolutionary rates. Most positively selected sites in these proteins are located in intrinsically disordered regions and some of them impact protein posttranslational modifications, such as autocleavage and phosphorylation. Detailed analyses of the biophysical properties of intrinsically disordered regions showed that positive selection preferentially targeted regions with lower conformational entropy. Furthermore, positive selection introduces variation in binary sequence patterns across orthologues, as well as in chain compaction. Our results shed light on the evolutionary trajectories of a unique class of mammalian genes and suggest a novel approach to study how intrinsically disordered region biophysical characteristics are affected by evolution.


Asunto(s)
Evolución Molecular , Retroelementos , Animales , Selección Genética , Mamíferos/genética , Mamíferos/virología , Proteínas Intrínsecamente Desordenadas/genética , Sistema de Lectura Ribosómico , Humanos
12.
Front Neuroendocrinol ; 72: 101115, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-37993020

RESUMEN

Bipolar disorder (BD) is worldwide a prevalent mental illness and a leading risk factor for suicide. Over the past three decades, it has been discovered that sex differences exist throughout the entire panorama of BD, but the etiologic regions and mechanisms that generate such differences remain poorly characterized. Available evidence indicates that the dorsolateral prefrontal cortex (DLPFC), a critical region that controls higher-order cognitive processing and mood, exhibits biological disparities between male and female patients with psychiatric disorders, which are highly correlated with the co-occurrence of psychotic symptoms. This review addresses the sex differences in BD concerning epidemiology, cognitive impairments, clinical manifestations, neuroimaging, and laboratory abnormalities. It also provides strong evidence linking DLPFC to the etiopathogenesis of these sex differences. We emphasize the importance of identifying gene signatures using human brain transcriptomics, which can depict sexually different variations, explain sex-biased symptomatic features, and provide novel targets for sex-specific therapeutics.


Asunto(s)
Trastorno Bipolar , Humanos , Masculino , Femenino , Trastorno Bipolar/etiología , Corteza Prefontal Dorsolateral , Corteza Prefrontal , Caracteres Sexuales , Encéfalo/patología
13.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36440949

RESUMEN

Protein-protein interactions play an important role in many biological processes. However, although structure prediction for monomer proteins has achieved great progress with the advent of advanced deep learning algorithms like AlphaFold, the structure prediction for protein-protein complexes remains an open question. Taking advantage of the Transformer model of ESM-MSA, we have developed a deep learning-based model, named DeepHomo2.0, to predict protein-protein interactions of homodimeric complexes by leveraging the direct-coupling analysis (DCA) and Transformer features of sequences and the structure features of monomers. DeepHomo2.0 was extensively evaluated on diverse test sets and compared with eight state-of-the-art methods including protein language model-based, DCA-based and machine learning-based methods. It was shown that DeepHomo2.0 achieved a high precision of >70% with experimental monomer structures and >60% with predicted monomer structures for the top 10 predicted contacts on the test sets and outperformed the other eight methods. Moreover, even the version without using structure information, named DeepHomoSeq, still achieved a good precision of >55% for the top 10 predicted contacts. Integrating the predicted contacts into protein docking significantly improved the structure prediction of realistic Critical Assessment of Protein Structure Prediction homodimeric complexes. DeepHomo2.0 and DeepHomoSeq are available at http://huanglab.phys.hust.edu.cn/DeepHomo2/.


Asunto(s)
Aprendizaje Profundo , Biología Computacional/métodos , Proteínas/química , Algoritmos , Aprendizaje Automático
14.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36403184

RESUMEN

The prediction of peptide and protein function is important for research and industrial applications, and many machine learning methods have been developed for this purpose. The existing models have encountered many challenges, including the lack of effective and comprehensive features and the limited applicability of each model. Here, we introduce an Integrated Peptide and Protein function prediction Framework based on Fused features and Ensemble models (IPPF-FE), which can accurately capture the relationship between features and labels. The results indicated that IPPF-FE outperformed existing state-of-the-art (SOTA) models on more than 8 different categories of peptide and protein tasks. In addition, t-distributed Stochastic Neighbour Embedding demonstrated the advantages of IPPF-FE. We anticipate that our method will become a versatile tool for peptide and protein prediction tasks and shed light on the future development of related models. The model is open source and available in the GitHub repository https://github.com/Luo-SynBioLab/IPPF-FE.


Asunto(s)
Federación Internacional para la Paternidad Responsable , Proteínas , Péptidos , Aprendizaje Automático
15.
Brief Bioinform ; 24(5)2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37643374

RESUMEN

Silencers are noncoding DNA sequence fragments located on the genome that suppress gene expression. The variation of silencers in specific cells is closely related to gene expression and cancer development. Computational approaches that exclusively rely on DNA sequence information for silencer identification fail to account for the cell specificity of silencers, resulting in diminished accuracy. Despite the discovery of several transcription factors and epigenetic modifications associated with silencers on the genome, there is still no definitive biological signal or combination thereof to fully characterize silencers, posing challenges in selecting suitable biological signals for their identification. Therefore, we propose a sophisticated deep learning framework called DeepICSH, which is based on multiple biological data sources. Specifically, DeepICSH leverages a deep convolutional neural network to automatically capture biologically relevant signal combinations strongly associated with silencers, originating from a diverse array of biological signals. Furthermore, the utilization of attention mechanisms facilitates the scoring and visualization of these signal combinations, whereas the employment of skip connections facilitates the fusion of multilevel sequence features and signal combinations, thereby empowering the accurate identification of silencers within specific cells. Extensive experiments on HepG2 and K562 cell line data sets demonstrate that DeepICSH outperforms state-of-the-art methods in silencer identification. Notably, we introduce for the first time a deep learning framework based on multi-omics data for classifying strong and weak silencers, achieving favorable performance. In conclusion, DeepICSH shows great promise for advancing the study and analysis of silencers in complex diseases. The source code is available at https://github.com/lyli1013/DeepICSH.


Asunto(s)
Aprendizaje Profundo , Genoma Humano , Humanos , Línea Celular , Epigénesis Genética , Multiómica
16.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36528803

RESUMEN

The advent of single-cell RNA-sequencing (scRNA-seq) provides an unprecedented opportunity to explore gene expression profiles at the single-cell level. However, gene expression values vary over time and under different conditions even within the same cell. There is an urgent need for more stable and reliable feature variables at the single-cell level to depict cell heterogeneity. Thus, we construct a new feature matrix called the delta rank matrix (DRM) from scRNA-seq data by integrating an a priori gene interaction network, which transforms the unreliable gene expression value into a stable gene interaction/edge value on a single-cell basis. This is the first time that a gene-level feature has been transformed into an interaction/edge-level for scRNA-seq data analysis based on relative expression orderings. Experiments on various scRNA-seq datasets have demonstrated that DRM performs better than the original gene expression matrix in cell clustering, cell identification and pseudo-trajectory reconstruction. More importantly, the DRM really achieves the fusion of gene expressions and gene interactions and provides a method of measuring gene interactions at the single-cell level. Thus, the DRM can be used to find changes in gene interactions among different cell types, which may open up a new way to analyze scRNA-seq data from an interaction perspective. In addition, DRM provides a new method to construct a cell-specific network for each single cell instead of a group of cells as in traditional network construction methods. DRM's exceptional performance is due to its extraction of rich gene-association information on biological systems and stable characterization of cells.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Transcriptoma , Análisis por Conglomerados
17.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36458437

RESUMEN

One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.


Asunto(s)
Proteínas Intrínsecamente Desordenadas , Proteínas Intrínsecamente Desordenadas/química , Biología Computacional/métodos , Secuencia de Aminoácidos , Péptidos/metabolismo , Dominios Proteicos , Bases de Datos de Proteínas , Unión Proteica
18.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36754843

RESUMEN

Scaffold proteins drive liquid-liquid phase separation (LLPS) to form biomolecular condensates and organize various biochemical reactions in cells. Dysregulation of scaffolds can lead to aberrant condensate assembly and various complex diseases. However, bioinformatics predictors dedicated to scaffolds are still lacking and their development suffers from an extreme imbalance between limited experimentally identified scaffolds and unlabeled candidates. Here, using the joint distribution of hybrid multimodal features, we implemented a positive unlabeled (PU) learning-based framework named PULPS that combined ProbTagging and penalty logistic regression (PLR) to profile the propensity of scaffolds. PULPS achieved the best AUC of 0.8353 and showed an area under the lift curve (AUL) of 0.8339 as an estimation of true performance. Upon reviewing recent experimentally verified scaffolds, we performed a partial recovery with 2.85% increase in AUL from 0.8339 to 0.8577. In comparison, PULPS showed a 45.7% improvement in AUL compared with PLR, whereas 8.2% superiority over other existing tools. Our study first proved that PU learning is more suitable for scaffold prediction and demonstrated the widespread existence of phase separation states. This profile also uncovered potential scaffolds that co-drive LLPS in the human proteome and generated candidates for further experiments. PULPS is free for academic research at http://pulps.zbiolab.cn.


Asunto(s)
Fenómenos Fisiológicos Celulares , Proteoma , Humanos
19.
Plant Physiol ; 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38917225

RESUMEN

Single-stranded DNA (ssDNA) is essential for various DNA-templated processes in both eukaryotes and prokaryotes. However, comprehensive characterizations of ssDNA still lag in plants compared to non-plant systems. Here, we conducted in situ S1-seq (ISS1-seq), with starting gDNA ranging from 5 µg to 250 ng, followed by comprehensive characterizations of ssDNA in rice (Oryza sativa L.). We found that ssDNA loci were substantially associated with a subset of non-B DNA structures and functional genomic loci. Subtypes of ssDNA loci had distinct epigenetic features. Importantly, ssDNA may act alone or partly coordinate with non-B DNA structures, functional genomic loci, or epigenetic marks to actively or repressively modulate gene transcription, which is genomic-region-dependent and associated with the distinct accumulation of RNA Pol II. Moreover, distinct types of ssDNA had differential impacts on the activities and evolution of TEs (especially common or conserved TEs) in the rice genome. Our study showcases an antibody-independent technique for characterizing non-B DNA structures or functional genomic loci in plants. It lays the groundwork and fills a crucial gap for further exploration of ssDNA, non-B DNA structures, or functional genomic loci, thereby advancing our understanding of their biology in plants.

20.
J Pathol ; 264(1): 55-67, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39022845

RESUMEN

Esophageal spindle-cell squamous cell carcinoma (ESS) is a rare biphasic neoplasm composed of a carcinomatous component (CaC) and a sarcomatous component (SaC). However, the genomic origin and gene signature of ESS remain unclear. Using whole-exome sequencing of laser-capture microdissection (LCM) tumor samples, we determined that CaC and SaC showed high mutational commonality, with the same top high-frequency mutant genes, mutation signatures, and tumor mutation burden; paired samples shared a median of 25.5% mutation sites. Focal gains were found on chromosomes 3q29, 5p15.33, and 11q13.3. Altered genes were mainly enriched in the RTK-RAS signaling pathway. Phylogenetic trees showed a monoclonal origin of ESS. The most frequently mutated oncogene in the trunk was TP53, followed by NFE2L2, KMT2D, and MUC16. Prognostic associations were found for CDC27, LRP2, APC, and SNAPC4. Our data highlight the monoclonal origin of ESS with TP53 as a potent driver oncogene, suggesting new targeted therapies and immunotherapies as treatment options. © 2024 The Pathological Society of Great Britain and Ireland.


Asunto(s)
Neoplasias Esofágicas , Carcinoma de Células Escamosas de Esófago , Secuenciación del Exoma , Mutación , Humanos , Carcinoma de Células Escamosas de Esófago/genética , Carcinoma de Células Escamosas de Esófago/patología , Masculino , Femenino , Persona de Mediana Edad , Neoplasias Esofágicas/genética , Neoplasias Esofágicas/patología , Anciano , Biomarcadores de Tumor/genética , Captura por Microdisección con Láser
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA