Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Biol Med ; 171: 108225, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38442556

ABSTRACT

BACKGROUND AND OBJECTIVES: Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for exploring cellular heterogeneity, discovering novel or rare cell types, distinguishing between tissue-specific cellular composition, and understanding cell differentiation during development. However, due to technological limitations, dropout events in scRNA-seq can mistakenly convert some entries in the real data to zero. This is equivalent to introducing noise into the data of cell gene expression entries. The data is contaminated, which affects the performance of downstream analyses, including clustering, cell annotation, differential gene expression analysis, and so on. Therefore, it is a crucial work to accurately determine which zeros are due to dropout events and perform imputation operations on them. METHODS: Considering the different confidence levels of different zeros in the gene expression matrix, this paper proposes a SinCWIm method for dropout events in scRNA-seq based on weighted alternating least squares (WALS). The method utilizes Pearson correlation coefficient and hierarchical clustering to quantify the confidence of zero entries. It is then combined with WALS for matrix decomposition. And the imputation result is made close to the actual number by outlier removal and data correction operations. RESULTS: A total of eight single-cell sequencing datasets were used for comparative experiments to demonstrate the overall superiority of SinCWIm over state-of-the-art models. SinCWIm was applied to cluster the data to obtain an adjusted RAND index evaluation, and the Usoskin, Pollen and Bladder datasets scored 94.46%, 96.48% and 76.74%, respectively. In addition, significant improvements were made in the retention of differential expression genes and visualization. CONCLUSIONS: SinCWIm provides a valuable imputation method for handling dropout events in single-cell sequencing data. In comparison to advanced methods, SinCWIm demonstrates excellent performance in clustering, visualization and other aspects. It is applicable to various single-cell sequencing datasets.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Sequence Analysis, RNA/methods , Base Sequence , Least-Squares Analysis , Single-Cell Analysis/methods , Cluster Analysis , Software
2.
Front Genet ; 14: 1272016, 2023.
Article in English | MEDLINE | ID: mdl-37854059

ABSTRACT

Syndrome differentiation and treatment is the basic principle of traditional Chinese medicine (TCM) to recognize and treat diseases. Accurate syndrome differentiation can provide a reliable basis for treatment, therefore, establishing a scientific intelligent syndrome differentiation method is of great significance to the modernization of TCM. With the development of biomdical text mining technology, TCM has entered the era of intelligence that based on data, and model training increasingly relies on the large-scale labeled data. However, it is difficult to form a large standard data set in the field of TCM due to the low degree of standardization of TCM data collection and the privacy protection of patients' medical records. To solve the above problem, a multi-label deep forest model based on an improved multi-label ReliefF feature selection algorithm, ML-PRDF, is proposed to enhance the representativeness of features within the model, express the original information with fewer features, and achieve optimal classification accuracy, while alleviating the problem of high data processing cost of deep forest models and achieving effective TCM discriminative analysis under small samples. The results show that the proposed model finally outperforms other multi-label classification models in terms of multi-label evaluation criteria, and has higher accuracy in the TCM syndrome differentiation problem compared with the traditional multi-label deep forest, and the comparative study shows that the use of PCC-MLRF algorithm for feature selection can better select representative features.

3.
Comput Biol Med ; 165: 107366, 2023 10.
Article in English | MEDLINE | ID: mdl-37633089

ABSTRACT

LncRNA-protein interactionplays an important regulatory role in biological processes. In this paper, the proposed RPIPCM based on a novel deep network model uses the sequence feature encoding of both RNA and protein to predict lncRNA-protein interactions (LPIs). A negative sampling of sliding window method is proposed for solving the problem of unbalanced between positive and negative samples. The proposed negative sampling method is effective and helpful to solve the problem of data imbalance in the existing LPIs research by comparative experiments. Experimental results also show that the proposed sequence feature encoding method has good performance in predicting LPIs for different datasets of different sizes and types. In the RPI488 dataset related to animal, compared with the direct original sequence encoding model, the accuracy of sequence feature encoding model increased by 1.02%, the recall increased by 4.08%, and the value of MCC increased by 1.67%. In the case of the plant dataset ATH948, the sequence feature-based encoding demonstrated a 1.58% higher accuracy, a 1.53% higher recall, a 1.62% higher specificity, a 1.62% higher precision, and a 3.16% higher value of MCC compared to the direct original sequence-based encoding. Compared with the latest prediction work in the ZEA22133 dataset, RPIPCM is shown to be more effective with the accuracy increased by 2.23%, the recall increased by 1.78%, the specificity increased by 2.67%, the precision increased by 2.52%, and the value of MCC increased by 4.43%, which also proves the effectiveness and robustness of RPIPCM. In conclusion, RPIPCM of deep network model based on sequence feature encoding can automatically mine the hidden feature information of the sequence in the lncRNA-protein interaction without relying on external features or prior biomedical knowledge, and its low cost and high efficiency can provide a reference for biomedical researchers.


Subject(s)
RNA, Long Noncoding , Animals , RNA, Long Noncoding/genetics , Computational Biology/methods
4.
Comput Biol Med ; 158: 106868, 2023 05.
Article in English | MEDLINE | ID: mdl-37037149

ABSTRACT

Pancreatitis is a relatively serious disease caused by the self-digestion of trypsin in the pancreas. The generation of diseases is closely related to gene and phenotype information. Generally, gene-phenotype relations are mainly obtained through clinical experiments, but the cost is huge. With the amount of published biomedical literature increasing exponentially, it carries a wealth of disease-related gene and phenotype information. This study provided an effective way to obtain disease-related gene and phenotype information. To our best knowledge, this work first attempted to explore relationships between genotype and phenotype about the pancreatitis from the computational perspective. It mined 6152 genes and 76,753 pairs of genotype and phenotype extracted from the biomedical literature about pancreatitis using text mining. Based on the above 76,753 pairs, the study proposed an improved normalized point-wise mutual information (REL-NPMI) model to optimize gene-phenotype relations related to pancreatitis, and obtained 12,562 gene-phenotype pairs which may be related to pancreatitis. The extracted top 20 results were validated and evaluated. The experimental results show that the method is promising for exploring pancreatitis' molecular mechanism, thus it provides a computational way for studying pancreatitis' disease pathogenesis. Data resources and the Pancreatitis Gene-Phenotype Association Database are available at http://114.116.4.45:8081/and resources are also available at https://github.com/polipoptbe8023/REL-NPMI.git.


Subject(s)
Pancreatitis , Humans , Genotype , Phenotype , Databases, Factual , Pancreatitis/genetics , Data Mining/methods
5.
J Healthc Eng ; 2020: 8829219, 2020.
Article in English | MEDLINE | ID: mdl-33299537

ABSTRACT

Background: Clinical named entity recognition is the basic task of mining electronic medical records text, which are with some challenges containing the language features of Chinese electronic medical records text with many compound entities, serious missing sentence components, and unclear entity boundary. Moreover, the corpus of Chinese electronic medical records is difficult to obtain. Methods: Aiming at these characteristics of Chinese electronic medical records, this study proposed a Chinese clinical entity recognition model based on deep learning pretraining. The model used word embedding from domain corpus and fine-tuning of entity recognition model pretrained by relevant corpus. Then BiLSTM and Transformer are, respectively, used as feature extractors to identify four types of clinical entities including diseases, symptoms, drugs, and operations from the text of Chinese electronic medical records. Results: 75.06% Macro-P, 76.40% Macro-R, and 75.72% Macro-F1 aiming at test dataset could be achieved. These experiments show that the Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition effect. Conclusions: These experiments show that the proposed Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition performance.


Subject(s)
Deep Learning , Electronic Health Records , China , Humans , Language , Natural Language Processing
6.
IEEE Trans Biomed Eng ; 60(12): 3410-7, 2013 Dec.
Article in English | MEDLINE | ID: mdl-23591466

ABSTRACT

Understanding the role of genetics in diseases is one of the most important tasks in the postgenome era. It is generally too expensive and time consuming to perform experimental validation for all candidate genes related to disease. Computational methods play important roles for prioritizing these candidates. Herein, we propose an approach to prioritize disease genes using latent semantic mapping based on singular value decomposition. Our hypothesis is that similar functional genes are likely to cause similar diseases. Measuring the functional similarity between known disease susceptibility genes and unknown genes is to predict new disease susceptibility genes. Taking autism as an instance, the analysis results of the top ten genes prioritized demonstrate they might be autism susceptibility genes, which also indicates our approach could discover new disease susceptibility genes. The novel approach of disease gene prioritization could discover new disease susceptibility genes, and latent disease-gene relations. The prioritized results could also support the interpretive diversity and experimental views as computational evidence for disease researchers.


Subject(s)
Computational Biology/methods , Genetic Predisposition to Disease/classification , Models, Genetic , Algorithms , Autistic Disorder/genetics , Biomedical Research , Genes/genetics , Humans
7.
J Neurosci Res ; 90(6): 1119-25, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22345019

ABSTRACT

Autism is a complex neuropsychiatric disorder with high heritability and an unclear etiology. The identification of key genes related to autism may elucidate its etiology. The current study provides an approach to predicting autism susceptibility genes. Genes are first extracted from the biomedical literature, and some autism susceptibility genes are then recognized as seeds by the prior knowledge. As candidates, the remaining genes are predicted by creating association rules between the seeds and candidates. In an evaluated data set, 27 autism susceptibility genes (type "Y") are extracted and 43 possible autism susceptibility genes (type "P") are predicted. The sum of "Y" and "P" genes accounts for 93.3% of the data set that are not contained in the typical database of autism susceptibility genes. Our approach can effectively extract and predict autism susceptibility genes from the biomedical literature. These predicted results complement the typical database of autism susceptibility genes. The web portal for the predicted results, which is freely available at http://biolab.hyit.edu.cn/ar, can be a valuable resource in studies of diseases related to genes.


Subject(s)
Autistic Disorder/genetics , Genetic Predisposition to Disease/genetics , Genetic Testing/methods , Algorithms , Genetic Association Studies , Humans
8.
BMC Evol Biol ; 10: 346, 2010 Nov 10.
Article in English | MEDLINE | ID: mdl-21067568

ABSTRACT

BACKGROUND: MicroRNAs (miRNAs) are a class of short regulatory RNAs encoded in the genome of DNA viruses, some single cell organisms, plants and animals. With the rapid development of technology, more and more miRNAs are being discovered. However, the origin and evolution of most miRNAs remain obscure. Here we report the origin and evolution dynamics of a human miRNA family. RESULTS: We have shown that all members of the miR-1302 family are derived from MER53 elements. Although the conservation scores of the MER53-derived pre-miRNA sequences are low, we have identified 36 potential paralogs of MER53-derived miR-1302 genes in the human genome and 58 potential orthologs of the human miR-1302 family in placental mammals. We suggest that in placental species, this miRNA family has evolved following the birth-and-death model of evolution. Three possible mechanisms that can mediate miRNA duplication in evolutionary history have been proposed: the transposition of the MER53 element, segmental duplications and Alu-mediated recombination. Finally, we have found that the target genes of miR-1302 are over-represented in transportation, localization, and system development processes and in the positive regulation of cellular processes. Many of them are predicted to function in binding and transcription regulation. CONCLUSIONS: The members of miR-1302 family that are derived from MER53 elements are placental-specific miRNAs. They emerged at the early stage of the recent 180 million years since eutherian mammals diverged from marsupials. Under the birth-and-death model, the miR-1302 genes have experienced a complex expansion with some members evolving by segmental duplications and some by Alu-mediated recombination events.


Subject(s)
Evolution, Molecular , Genome, Human , MicroRNAs/genetics , Multigene Family , Placenta/metabolism , Base Sequence , Conserved Sequence , Female , Humans , Molecular Sequence Data , Phylogeny , Pregnancy , Segmental Duplications, Genomic , Sequence Alignment , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL
...