Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 433
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nat Immunol ; 25(1): 66-76, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38168955

RESUMO

CD4+ T cells are central to various immune responses, but the molecular programs that drive and maintain CD4+ T cell immunity are not entirely clear. Here we identify a stem-like program that governs the CD4+ T cell response in transplantation models. Single-cell-transcriptomic analysis revealed that naive alloantigen-specific CD4+ T cells develop into TCF1hi effector precursor (TEP) cells and TCF1-CXCR6+ effectors in transplant recipients. The TCF1-CXCR6+CD4+ effectors lose proliferation capacity and do not reject allografts upon adoptive transfer into secondary hosts. By contrast, the TCF1hiCD4+ TEP cells have dual features of self-renewal and effector differentiation potential, and allograft rejection depends on continuous replenishment of TCF1-CXCR6+ effectors from TCF1hiCD4+ TEP cells. Mechanistically, TCF1 sustains the CD4+ TEP cell population, whereas the transcription factor IRF4 and the glycolytic enzyme LDHA govern the effector differentiation potential of CD4+ TEP cells. Deletion of IRF4 or LDHA in T cells induces transplant acceptance. These findings unravel a stem-like program that controls the self-renewal capacity and effector differentiation potential of CD4+ TEP cells and have implications for T cell-related immunotherapies.


Assuntos
Regulação da Expressão Gênica , Linfócitos T Reguladores , Diferenciação Celular
2.
Proc Natl Acad Sci U S A ; 121(28): e2321193121, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38954549

RESUMO

Iron antimonide (FeSb2) has been investigated for decades due to its puzzling electronic properties. It undergoes the temperature-controlled transition from an insulator to an ill-defined metal, with a cross-over from diamagnetism to paramagnetism. Extensive efforts have been made to uncover the underlying mechanism, but a consensus has yet to be reached. While macroscopic transport and magnetic measurements can be explained by different theoretical proposals, the essential spectroscopic evidence required to distinguish the physical origin is missing. In this paper, through the use of X-ray absorption spectroscopy and atomic multiplet simulations, we have observed the mixed spin states of 3d 6 configuration in FeSb2. Furthermore, we reveal that the enhancement of the conductivity, whether induced by temperature or doping, is characterized by populating the high-spin state from the low-spin state. Our work constitutes vital spectroscopic evidence that the electrical/magnetical transition in FeSb2 is directly associated with the spin-state excitation.

3.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38920341

RESUMO

Drug-target interactions (DTIs) are a key part of drug development process and their accurate and efficient prediction can significantly boost development efficiency and reduce development time. Recent years have witnessed the rapid advancement of deep learning, resulting in an abundance of deep learning-based models for DTI prediction. However, most of these models used a single representation of drugs and proteins, making it difficult to comprehensively represent their characteristics. Multimodal data fusion can effectively compensate for the limitations of single-modal data. However, existing multimodal models for DTI prediction do not take into account both intra- and inter-modal interactions simultaneously, resulting in limited presentation capabilities of fused features and a reduction in DTI prediction accuracy. A hierarchical multimodal self-attention-based graph neural network for DTI prediction, called HMSA-DTI, is proposed to address multimodal feature fusion. Our proposed HMSA-DTI takes drug SMILES, drug molecular graphs, protein sequences and protein 2-mer sequences as inputs, and utilizes a hierarchical multimodal self-attention mechanism to achieve deep fusion of multimodal features of drugs and proteins, enabling the capture of intra- and inter-modal interactions between drugs and proteins. It is demonstrated that our proposed HMSA-DTI has significant advantages over other baseline methods on multiple evaluation metrics across five benchmark datasets.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Proteínas/química , Proteínas/metabolismo , Humanos , Algoritmos , Biologia Computacional/métodos
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38343326

RESUMO

Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.


Assuntos
Metagenoma , Microbiota , Microbiota/genética , Benchmarking
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38678387

RESUMO

In the growth and development of multicellular organisms, the immune processes of the immune system and the maintenance of the organism's internal environment, cell communication plays a crucial role. It exerts a significant influence on regulating internal cellular states such as gene expression and cell functionality. Currently, the mainstream methods for studying intercellular communication are focused on exploring the ligand-receptor-transcription factor and ligand-receptor-subunit scales. However, there is relatively limited research on the association between intercellular communication and highly variable genes (HVGs). As some HVGs are closely related to cell communication, accurately identifying these HVGs can enhance the accuracy of constructing cell communication networks. The rapid development of single-cell sequencing (scRNA-seq) and spatial transcriptomics technologies provides a data foundation for exploring the relationship between intercellular communication and HVGs. Therefore, we propose CPPLS-MLP, which can identify HVGs closely related to intercellular communication and further analyze the impact of Multiple Input Multiple Output cellular communication on the differential expression of these HVGs. By comparing with the commonly used method CCPLS for constructing intercellular communication networks, we validated the superior performance of our method in identifying cell-type-specific HVGs and effectively analyzing the influence of neighboring cell types on HVG expression regulation. Source codes for the CPPLS_MLP R, python packages and the related scripts are available at 'CPPLS_MLP Github [https://github.com/wuzhenao/CPPLS-MLP]'.


Assuntos
Comunicação Celular , Análise de Célula Única , Análise de Célula Única/métodos , Transcriptoma , Perfilação da Expressão Gênica/métodos , Humanos , Biologia Computacional/métodos , Redes Reguladoras de Genes , Animais , Software , Algoritmos
6.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38920343

RESUMO

While significant strides have been made in predicting neoepitopes that trigger autologous CD4+ T cell responses, accurately identifying the antigen presentation by human leukocyte antigen (HLA) class II molecules remains a challenge. This identification is critical for developing vaccines and cancer immunotherapies. Current prediction methods are limited, primarily due to a lack of high-quality training epitope datasets and algorithmic constraints. To predict the exogenous HLA class II-restricted peptides across most of the human population, we utilized the mass spectrometry data to profile >223 000 eluted ligands over HLA-DR, -DQ, and -DP alleles. Here, by integrating these data with peptide processing and gene expression, we introduce HLAIImaster, an attention-based deep learning framework with adaptive domain knowledge for predicting neoepitope immunogenicity. Leveraging diverse biological characteristics and our enhanced deep learning framework, HLAIImaster is significantly improved against existing tools in terms of positive predictive value across various neoantigen studies. Robust domain knowledge learning accurately identifies neoepitope immunogenicity, bridging the gap between neoantigen biology and the clinical setting and paving the way for future neoantigen-based therapies to provide greater clinical benefit. In summary, we present a comprehensive exploitation of the immunogenic neoepitope repertoire of cancers, facilitating the effective development of "just-in-time" personalized vaccines.


Assuntos
Aprendizado Profundo , Antígenos de Histocompatibilidade Classe II , Humanos , Antígenos de Histocompatibilidade Classe II/imunologia , Epitopos/imunologia , Biologia Computacional/métodos , Epitopos de Linfócito T/imunologia
7.
Nucleic Acids Res ; 52(D1): D562-D571, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953313

RESUMO

The single-cell proteomics enables the direct quantification of protein abundance at the single-cell resolution, providing valuable insights into cellular phenotypes beyond what can be inferred from transcriptome analysis alone. However, insufficient large-scale integrated databases hinder researchers from accessing and exploring single-cell proteomics, impeding the advancement of this field. To fill this deficiency, we present a comprehensive database, namely Single-cell Proteomic DataBase (SPDB, https://scproteomicsdb.com/), for general single-cell proteomic data, including antibody-based or mass spectrometry-based single-cell proteomics. Equipped with standardized data process and a user-friendly web interface, SPDB provides unified data formats for convenient interaction with downstream analysis, and offers not only dataset-level but also protein-level data search and exploration capabilities. To enable detailed exhibition of single-cell proteomic data, SPDB also provides a module for visualizing data from the perspectives of cell metadata or protein features. The current version of SPDB encompasses 133 antibody-based single-cell proteomic datasets involving more than 300 million cells and over 800 marker/surface proteins, and 10 mass spectrometry-based single-cell proteomic datasets involving more than 4000 cells and over 7000 proteins. Overall, SPDB is envisioned to be explored as a useful resource that will facilitate the wider research communities by providing detailed insights into proteomics from the single-cell perspective.


Assuntos
Proteínas , Proteômica , Anticorpos , Bases de Conhecimento , Espectrometria de Massas , Humanos , Animais , Análise de Célula Única
8.
PLoS Genet ; 19(10): e1010905, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37819938

RESUMO

Retinal Müller glia (MG) can act as stem-like cells to generate new neurons in both zebrafish and mice. In zebrafish, retinal regeneration is innate and robust, resulting in the replacement of lost neurons and restoration of visual function. In mice, exogenous stimulation of MG is required to reveal a dormant and, to date, limited regenerative capacity. Zebrafish studies have been key in revealing factors that promote regenerative responses in the mammalian eye. Increased understanding of how the regenerative potential of MG is regulated in zebrafish may therefore aid efforts to promote retinal repair therapeutically. Developmental signaling pathways are known to coordinate regeneration following widespread retinal cell loss. In contrast, less is known about how regeneration is regulated in the context of retinal degenerative disease, i.e., following the loss of specific retinal cell types. To address this knowledge gap, we compared transcriptomic responses underlying regeneration following targeted loss of rod photoreceptors or bipolar cells. In total, 2,531 differentially expressed genes (DEGs) were identified, with the majority being paradigm specific, including during early MG activation phases, suggesting the nature of the injury/cell loss informs the regenerative process from initiation onward. For example, early modulation of Notch signaling was implicated in the rod but not bipolar cell ablation paradigm and components of JAK/STAT signaling were implicated in both paradigms. To examine candidate gene roles in rod cell regeneration, including several immune-related factors, CRISPR/Cas9 was used to create G0 mutant larvae (i.e., "crispants"). Rod cell regeneration was inhibited in stat3 crispants, while mutating stat5a/b, c7b and txn accelerated rod regeneration kinetics. These data support emerging evidence that discrete responses follow from selective retinal cell loss and that the immune system plays a key role in regulating "fate-biased" regenerative processes.


Assuntos
Transcriptoma , Peixe-Zebra , Animais , Camundongos , Peixe-Zebra/genética , Animais Geneticamente Modificados , Transcriptoma/genética , Retina/metabolismo , Neurônios , Proliferação de Células , Mamíferos
9.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36929862

RESUMO

Accurate prediction of promoter regions driving miRNA gene expression has become a major challenge due to the lack of annotation information for pri-miRNA transcripts. This defect hinders our understanding of miRNA-mediated regulatory networks. Some algorithms have been designed during the past decade to detect miRNA promoters. However, these methods rely on biosignal data such as CpG islands and still need to be improved. Here, we propose miProBERT, a BERT-based model for predicting promoters directly from gene sequences without using any structural or biological signals. According to our information, it is the first time a BERT-based model has been employed to identify miRNA promoters. We use the pre-trained model DNABERT, fine-tune the pre-trained model on the gene promoter dataset so that the model includes information about the richer biological properties of promoter sequences in its representation, and then systematically scan the upstream regions of each intergenic miRNA using the fine-tuned model. About, 665 miRNA promoters are found. The innovative use of a random substitution strategy to construct a negative dataset improves the discriminative ability of the model and further reduces the false positive rate (FPR) to as low as 0.0421. On independent datasets, miProBERT outperformed other gene promoter prediction methods. With comparison on 33 experimentally validated miRNA promoter datasets, miProBERT significantly outperformed previously developed miRNA promoter prediction programs with 78.13% precision and 75.76% recall. We further verify the predicted promoter regions by analyzing conservation, CpG content and histone marks. The effectiveness and robustness of miProBERT are highlighted.


Assuntos
MicroRNAs , MicroRNAs/metabolismo , Regiões Promotoras Genéticas , Algoritmos , Ilhas de CpG
10.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37643374

RESUMO

Silencers are noncoding DNA sequence fragments located on the genome that suppress gene expression. The variation of silencers in specific cells is closely related to gene expression and cancer development. Computational approaches that exclusively rely on DNA sequence information for silencer identification fail to account for the cell specificity of silencers, resulting in diminished accuracy. Despite the discovery of several transcription factors and epigenetic modifications associated with silencers on the genome, there is still no definitive biological signal or combination thereof to fully characterize silencers, posing challenges in selecting suitable biological signals for their identification. Therefore, we propose a sophisticated deep learning framework called DeepICSH, which is based on multiple biological data sources. Specifically, DeepICSH leverages a deep convolutional neural network to automatically capture biologically relevant signal combinations strongly associated with silencers, originating from a diverse array of biological signals. Furthermore, the utilization of attention mechanisms facilitates the scoring and visualization of these signal combinations, whereas the employment of skip connections facilitates the fusion of multilevel sequence features and signal combinations, thereby empowering the accurate identification of silencers within specific cells. Extensive experiments on HepG2 and K562 cell line data sets demonstrate that DeepICSH outperforms state-of-the-art methods in silencer identification. Notably, we introduce for the first time a deep learning framework based on multi-omics data for classifying strong and weak silencers, achieving favorable performance. In conclusion, DeepICSH shows great promise for advancing the study and analysis of silencers in complex diseases. The source code is available at https://github.com/lyli1013/DeepICSH.


Assuntos
Aprendizado Profundo , Genoma Humano , Humanos , Linhagem Celular , Epigênese Genética , Multiômica
11.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36987781

RESUMO

Identifying disease-gene associations is a fundamental and critical biomedical task towards understanding molecular mechanisms, the diagnosis and treatment of diseases. It is time-consuming and expensive to experimentally verify causal links between diseases and genes. Recently, deep learning methods have achieved tremendous success in identifying candidate genes for genetic diseases. The gene prediction problem can be modeled as a link prediction problem based on the features of nodes and edges of the gene-disease graph. However, most existing researches either build homogeneous networks based on one single data source or heterogeneous networks based on multi-source data, and artificially define meta-paths, so as to learn the network representation of diseases and genes. The former cannot make use of abundant multi-source heterogeneous information, while the latter needs domain knowledge and experience when defining meta-paths, and the accuracy of the model largely depends on the definition of meta-paths. To address the aforementioned challenges above bottlenecks, we propose an end-to-end disease-gene association prediction model with parallel graph transformer network (DGP-PGTN), which deeply integrates the heterogeneous information of diseases, genes, ontologies and phenotypes. DGP-PGTN can automatically and comprehensively capture the multiple latent interactions between diseases and genes, discover the causal relationship between them and is fully interpretable at the same time. We conduct comprehensive experiments and show that DGP-PGTN outperforms the state-of-the-art methods significantly on the task of disease-gene association prediction. Furthermore, DGP-PGTN can automatically learn the implicit relationship between diseases and genes without manually defining meta paths.


Assuntos
Algoritmos , Redes Neurais de Computação , Fenótipo
12.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36723605

RESUMO

Identifying gene regulatory networks (GRNs) at the resolution of single cells has long been a great challenge, and the advent of single-cell multi-omics data provides unprecedented opportunities to construct GRNs. Here, we propose a novel strategy to integrate omics datasets of single-cell ribonucleic acid sequencing and single-cell Assay for Transposase-Accessible Chromatin using sequencing, and using an unsupervised learning neural network to divide the samples with high copy number variation scores, which are used to infer the GRN in each gene block. Accuracy validation of proposed strategy shows that approximately 80% of transcription factors are directly associated with cancer, colorectal cancer, malignancy and disease by TRRUST; and most transcription factors are prone to produce multiple transcript variants and lead to tumorigenesis by RegNetwork database, respectively. The source code access are available at: https://github.com/Cuily-v/Colorectal_cancer.


Assuntos
Neoplasias Colorretais , Redes Reguladoras de Genes , Humanos , Multiômica , Variações do Número de Cópias de DNA , Algoritmos , Fatores de Transcrição/genética , Neoplasias Colorretais/genética
13.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36892153

RESUMO

Accurate and effective drug-target interaction (DTI) prediction can greatly shorten the drug development lifecycle and reduce the cost of drug development. In the deep-learning-based paradigm for predicting DTI, robust drug and protein feature representations and their interaction features play a key role in improving the accuracy of DTI prediction. Additionally, the class imbalance problem and the overfitting problem in the drug-target dataset can also affect the prediction accuracy, and reducing the consumption of computational resources and speeding up the training process are also critical considerations. In this paper, we propose shared-weight-based MultiheadCrossAttention, a precise and concise attention mechanism that can establish the association between target and drug, making our models more accurate and faster. Then, we use the cross-attention mechanism to construct two models: MCANet and MCANet-B. In MCANet, the cross-attention mechanism is used to extract the interaction features between drugs and proteins for improving the feature representation ability of drugs and proteins, and the PolyLoss loss function is applied to alleviate the overfitting problem and the class imbalance problem in the drug-target dataset. In MCANet-B, the robustness of the model is improved by combining multiple MCANet models and prediction accuracy further increases. We train and evaluate our proposed methods on six public drug-target datasets and achieve state-of-the-art results. In comparison with other baselines, MCANet saves considerable computational resources while maintaining accuracy in the leading position; however, MCANet-B greatly improves prediction accuracy by combining multiple models while maintaining a balance between computational resource consumption and prediction accuracy.


Assuntos
Desenvolvimento de Medicamentos , Descoberta de Drogas , Descoberta de Drogas/métodos , Proteínas/metabolismo , Sistemas de Liberação de Medicamentos , Domínios Proteicos
14.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38127088

RESUMO

With the emergence of spatial transcriptome sequencing (ST-seq), research now heavily relies on the joint analysis of ST-seq and single-cell RNA sequencing (scRNA-seq) data to precisely identify cell spatial composition in tissues. However, common methods for combining these datasets often merge data from multiple cells to generate pseudo-ST data, overlooking topological relationships and failing to represent spatial arrangements accurately. We introduce GTAD, a method utilizing the Graph Attention Network for deconvolution of integrated scRNA-seq and ST-seq data. GTAD effectively captures cell spatial relationships and topological structures within tissues using a graph-based approach, enhancing cell-type identification and our understanding of complex tissue cellular landscapes. By integrating scRNA-seq and ST data into a unified graph structure, GTAD outperforms traditional 'pseudo-ST' methods, providing robust and information-rich results. GTAD performs exceptionally well with synthesized spatial data and accurately identifies cell spatial composition in tissues like the mouse cerebral cortex, cerebellum, developing human heart and pancreatic ductal carcinoma. GTAD holds the potential to enhance our understanding of tissue microenvironments and cellular diversity in complex bio-logical systems. The source code is available at https://github.com/zzhjs/GTAD.


Assuntos
Análise da Expressão Gênica de Célula Única , Software , Humanos , Animais , Camundongos
15.
Methods ; 223: 136-145, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38360082

RESUMO

MOTIVATION: Drug-target interaction prediction is an important area of research to predict whether there is an interaction between a drug molecule and its target protein. It plays a critical role in drug discovery and development by facilitating the identification of potential drug candidates and expediting the overall process. Given the time-consuming, expensive, and high-risk nature of traditional drug discovery methods, the prediction of drug-target interactions has become an indispensable tool. Using machine learning and deep learning to tackle this class of problems has become a mainstream approach, and graph-based models have recently received much attention in this field. However, many current graph-based Drug-Target Interaction (DTI) prediction methods rely on manually defined rules to construct the Drug-Protein Pair (DPP) network during the DPP representation learning process. However, these methods fail to capture the true underlying relationships between drug molecules and target proteins. RESULTS: We propose GSL-DTI, an automatic graph structure learning model used for predicting drug-target interactions (DTIs). Initially, we integrate large-scale heterogeneous networks using a graph convolution network based on meta-paths, effectively learning the representations of drugs and target proteins. Subsequently, we construct drug-protein pairs based on these representations. In contrast to previous studies that construct DPP networks based on manual rules, our method introduces an automatic graph structure learning approach. This approach utilizes a filter gate on the affinity scores of DPPs and relies on the classification loss of downstream tasks to guide the learning of the underlying DPP network structure. Based on the learned DPP network, we transform the prediction of drug-target interactions into a node classification problem. The comprehensive experiments conducted on three public datasets have shown the superiority of GSL-DTI in the tasks of DTI prediction. Additionally, GSL-DTI provides a fresh perspective for advancing research in graph structure learning for DTI prediction.


Assuntos
Sistemas de Liberação de Medicamentos , Descoberta de Drogas , Aprendizado de Máquina
16.
J Cell Mol Med ; 28(8): e18275, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38568058

RESUMO

Breast cancer (BC) remains a significant health concern worldwide, with metastasis being a primary contributor to patient mortality. While advances in understanding the disease's progression continue, the underlying mechanisms, particularly the roles of long non-coding RNAs (lncRNAs), are not fully deciphered. In this study, we examined the influence of the lncRNA LINC00524 on BC invasion and metastasis. Through meticulous analyses of TCGA and GEO data sets, we observed a conspicuous elevation of LINC00524 expression in BC tissues. This increased expression correlated strongly with a poorer prognosis for BC patients. A detailed Gene Ontology analysis suggested that LINC00524 likely exerts its effects through RNA-binding proteins (RBPs) mechanisms. Experimentally, LINC00524 was demonstrated to amplify BC cell migration, invasion and proliferation in vitro. Additionally, in vivo tests showed its potent role in promoting BC cell growth and metastasis. A pivotal discovery was LINC00524's interaction with TDP43, which leads to the stabilization of TDP43 protein expression, an element associated with unfavourable BC outcomes. In essence, our comprehensive study illuminates how LINC00524 accelerates BC invasion and metastasis by binding to TDP43, presenting potential avenues for therapeutic interventions.


Assuntos
Neoplasias da Mama , RNA Longo não Codificante , Feminino , Humanos , Bioensaio , Neoplasias da Mama/genética , Transformação Celular Neoplásica , Ontologia Genética , RNA Longo não Codificante/genética
17.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35605226

RESUMO

Molecular signatures have been excessively reported for diagnosis of many cancers during the last 20 years. However, false-positive signatures are always found using statistical methods or machine learning approaches, and that makes subsequent biological experiments fail. Therefore, signature discovery has gradually become a non-mainstream work in bioinformatics. Actually, there are three critical weaknesses that make the identified signature unreliable. First of all, a signature is wrongly thought to be a gene set, each component of which keeps differential expressions between or among sample groups. Second, there may be many false-positive genes expressed differentially found, even if samples derived from cancer or normal group can be separated in one-dimensional space. Third, cross-platform validation results of a discovered signature are always poor. In order to solve these problems, we propose a new feature selection framework based on ensemble classification to discover signatures for cancer diagnosis. Meanwhile, a procedure for data transform among different expression profiles across different platforms is also designed. Signatures are found on simulation and real data representing different carcinomas across different platforms. Besides, false positives are suppressed. The experimental results demonstrate the effectiveness of our method.


Assuntos
Perfilação da Expressão Gênica , Neoplasias , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Humanos , Aprendizado de Máquina , Neoplasias/diagnóstico , Neoplasias/genética , RNA
18.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34623382

RESUMO

The outbreak of acute respiratory disease in 2019, namely Coronavirus Disease-2019 (COVID-19), has become an unprecedented healthcare crisis. To mitigate the pandemic, there are a lot of collective and multidisciplinary efforts in facilitating the rapid discovery of protein inhibitors or drugs against COVID-19. Although many computational methods to predict protein inhibitors have been developed [ 1- 5], few systematic reviews on these methods have been published. Here, we provide a comprehensive overview of the existing methods to discover potential inhibitors of COVID-19 virus, so-called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). First, we briefly categorize and describe computational approaches by the basic algorithms involved in. Then we review the related biological datasets used in such predictions. Furthermore, we emphatically discuss current knowledge on SARS-CoV-2 inhibitors with the latest findings and development of computational methods in uncovering protein inhibitors against COVID-19.


Assuntos
Antivirais/química , Tratamento Farmacológico da COVID-19 , COVID-19 , Biologia Computacional , Simulação de Acoplamento Molecular , Pandemias , SARS-CoV-2/metabolismo , Antivirais/uso terapêutico , COVID-19/epidemiologia , Bases de Dados Factuais , Humanos
19.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35512331

RESUMO

The ubiquitous dropout problem in single-cell RNA sequencing technology causes a large amount of data noise in the gene expression profile. For this reason, we propose an evolutionary sparse imputation (ESI) algorithm for single-cell transcriptomes, which constructs a sparse representation model based on gene regulation relationships between cells. To solve this model, we design an optimization framework based on nondominated sorting genetics. This framework takes into account the topological relationship between cells and the variety of gene expression to iteratively search the global optimal solution, thereby learning the Pareto optimal cell-cell affinity matrix. Finally, we use the learned sparse relationship model between cells to improve data quality and reduce data noise. In simulated datasets, scESI performed significantly better than benchmark methods with various metrics. By applying scESI to real scRNA-seq datasets, we discovered scESI can not only further classify the cell types and separate cells in visualization successfully but also improve the performance in reconstructing trajectories differentiation and identifying differentially expressed genes. In addition, scESI successfully recovered the expression trends of marker genes in stem cell differentiation and can discover new cell types and putative pathways regulating biological processes.


Assuntos
Análise de Célula Única , Transcriptoma , Análise por Conglomerados , Perfilação da Expressão Gênica , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Sequenciamento do Exoma
20.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36239380

RESUMO

In order to identify plant pentatricopeptide repeat (PPR) proteins, a framework of variable selection has been proposed. In fact, it is an effective feature selection strategy that focuses on the performance of classification. Random forest has been used as the classifier with certain variables automatically selected for discrimination between PPR functional and non-functional proteins. However, it is found that samples regarded as PPR functional proteins are wrongly classified in a high rate. In this paper, we plan to improve the framework in order to achieve better classification results. Modifications are made on the framework for better identifying PPR functional proteins. Instead of random forest, a hybrid ensemble classifier is built with its base classifiers derived from six different classification methods. Besides, an incremental strategy and a clustering by search in descending order are alternatively used for feature selection, which can effectively select the most representative variables for identification on PPR proteins. In addition, it can be found that different base classifiers alternately play an important role in the ensemble classifier with feature dimension increasing. The experimental results demonstrate the effectiveness of our improvements.


Assuntos
Algoritmos , Proteínas de Plantas , Proteínas de Plantas/genética , Análise por Conglomerados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA