Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Front Oncol ; 14: 1337631, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38476360

RESUMO

Background: Pleomorphic adenoma (PA), often with the benign-like imaging appearances similar to Warthin tumor (WT), however, is a potentially malignant tumor with a high recurrence rate. It is worse that pathological fine-needle aspiration cytology (FNAC) is difficult to distinguish PA and WT for inexperienced pathologists. This study employed deep learning (DL) technology, which effectively utilized ultrasound images, to provide a reliable approach for discriminating PA from WT. Methods: 488 surgically confirmed patients, including 266 with PA and 222 with WT, were enrolled in this study. Two experienced ultrasound physicians independently evaluated all images to differentiate between PA and WT. The diagnostic performance of preoperative FNAC was also evaluated. During the DL study, all ultrasound images were randomly divided into training (70%), validation (20%), and test (10%) sets. Furthermore, ultrasound images that could not be diagnosed by FNAC were also randomly allocated to training (60%), validation (20%), and test (20%) sets. Five DL models were developed to classify ultrasound images as PA or WT. The robustness of these models was assessed using five-fold cross-validation. The Gradient-weighted Class Activation Mapping (Grad-CAM) technique was employed to visualize the region of interest in the DL models. Results: In Grad-CAM analysis, the DL models accurately identified the mass as the region of interest. The area under the receiver operating characteristic curve (AUROC) of the two ultrasound physicians were 0.351 and 0.598, and FNAC achieved an AUROC of only 0.721. Meanwhile, for DL models, the AUROC value for discriminating between PA and WT in the test set was from 0.828 to 0.908. ResNet50 demonstrated the optimal performance with an AUROC of 0.908, an accuracy of 0.833, a sensitivity of 0.736, and a specificity of 0.904. In the test set of cases that FNAC failed to provide a diagnosis, DenseNet121 demonstrated the optimal performance with an AUROC of 0.897, an accuracy of 0.806, a sensitivity of 0.789, and a specificity of 0.824. Conclusion: For the discrimination of PA and WT, DL models are superior to ultrasound and FNAC, thereby facilitating surgeons in making informed decisions regarding the most appropriate surgical approach.

2.
J Imaging Inform Med ; 37(3): 965-975, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38347394

RESUMO

Thoracic echocardiography (TTE) can provide sufficient cardiac structure information, evaluate hemodynamics and cardiac function, and is an effective method for atrial septal defect (ASD) examination. This paper aims to study a deep learning method based on cardiac ultrasound video to assist in ASD diagnosis. We chose four standard views in pediatric cardiac ultrasound to identify atrial septal defects; the four standard views were as follows: subcostal sagittal view of the atrium septum (subSAS), apical four-chamber view (A4C), the low parasternal four-chamber view (LPS4C), and parasternal short-axis view of large artery (PSAX). We enlist data from 300 children patients as part of a double-blind experiment for five-fold cross-validation to verify the performance of our model. In addition, data from 30 children patients (15 positives and 15 negatives) are collected for clinician testing and compared to our model test results (these 30 samples do not participate in model training). In our model, we present a block random selection, maximal agreement decision, and frame sampling strategy for training and testing respectively, resNet18 and r3D networks are used to extract the frame features and aggregate them to build a rich video-level representation. We validate our model using our private dataset by five cross-validation. For ASD detection, we achieve 89.33 ± 3.13 AUC, 84.95 ± 3.88 accuracy, 85.70 ± 4.91 sensitivity, 81.51 ± 8.15 specificity, and 81.99 ± 5.30 F1 score. The proposed model is a multiple instances learning-based deep learning model for video atrial septal defect detection which effectively improves ASD detection accuracy when compared to the performances of previous networks and clinical doctors.


Assuntos
Aprendizado Profundo , Ecocardiografia , Comunicação Interatrial , Humanos , Comunicação Interatrial/diagnóstico por imagem , Criança , Ecocardiografia/métodos , Feminino , Masculino , Pré-Escolar , Método Duplo-Cego , Lactente , Interpretação de Imagem Assistida por Computador/métodos , Gravação em Vídeo , Adolescente
3.
Comput Struct Biotechnol J ; 23: 589-600, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38274993

RESUMO

Single-cell RNA sequencing (scRNA-seq) is currently an important technology for identifying cell types and studying diseases at the genetic level. Identifying rare cell types is biologically important as one of the downstream data analyses of single-cell RNA sequencing. Although rare cell identification methods have been developed, most of these suffer from insufficient mining of intercellular similarities, low scalability, and being time-consuming. In this paper, we propose a single-cell similarity division algorithm (scSID) for identifying rare cells. It takes cell-to-cell similarity into consideration by analyzing both inter-cluster and intra-cluster similarities, and discovers rare cell types based on the similarity differences. We show that scSID outperforms other existing methods by benchmarking it on different experimental datasets. Application of scSID to multiple datasets, including 68K PBMC and intestine, highlights its exceptional scalability and remarkable ability to identify rare cell populations.

4.
Comput Struct Biotechnol J ; 21: 4110-4117, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37671241

RESUMO

Colocalization analysis of genomic region sets has been widely adopted to unveil potential functional interactions between corresponding biological attributes, which often serves as the basis for further investigation. A number of methods have been developed for colocalization analysis of genomic elements. However, none of them explicitly considered the transcriptome heterogeneity and isoform ambiguity, making them less appropriate for analyzing transcriptome elements. Here, we developed RgnTX, an R/Bioconductor tool for the colocalization analysis of transcriptome elements with permutation tests. Different from existing approaches, RgnTX directly takes advantage of transcriptome annotation, and offers high flexibility in the null model to simulate realistic transcriptome-wide background, such as the complex alternative splicing patterns. Importantly, it supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. Proposed package offers a wide selection of pre-defined functions, easy to be utilized by users for visualizing permutation results, calculating shifted z-scores and conducting multiple hypothesis testing under Benjamini-Hochberg correction. Moreover, with synthetic and real datasets, we show that RgnTX novel testing modes return distinct and more significant results compared to existing genome-based methods. We believe RgnTX should make a useful tool to characterize the randomness of the transcriptome, and for conducting statistical association analysis for genomic region sets within the heterogeneous transcriptome. The package now has been accepted by Bioconductor and is freely available at: https://bioconductor.org/packages/RgnTX.

5.
Comput Biol Med ; 163: 107152, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37364529

RESUMO

Single-cell RNA sequencing (scRNA-seq) is now a successful technique for identifying cellular heterogeneity, revealing novel cell subpopulations, and forecasting developmental trajectories. A crucial component of the processing of scRNA-seq data is the precise identification of cell subpopulations. Although many unsupervised clustering methods have been developed to cluster cell subpopulations, the performance of these methods is vulnerable to dropouts and high dimensionality. In addition, most existing methods are time-consuming and fail to adequately account for potential associations between cells. In the manuscript, we present an unsupervised clustering method based on an adaptive simplified graph convolution model called scASGC. The proposed method builds plausible cell graphs, aggregates neighbor information using a simplified graph convolution model, and adaptively determines the most optimal number of convolution layers for various graphs. Experiments on 12 public datasets show that scASGC outperforms both classical and state-of-the-art clustering methods. In addition, in a study of mouse intestinal muscle containing 15,983 cells, we identified distinct marker genes based on the clustering results of scASGC. The source code of scASGC is available at https://github.com/ZzzOctopus/scASGC.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Animais , Camundongos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise da Expressão Gênica de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados
6.
Appl Intell (Dordr) ; : 1-16, 2023 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-37363384

RESUMO

In machine learning, multiple instance learning is a method evolved from supervised learning algorithms, which defines a "bag" as a collection of multiple examples with a wide range of applications. In this paper, we propose a novel deep multiple instance learning model for medical image analysis, called triple-kernel gated attention-based multiple instance learning with contrastive learning. It can be used to overcome the limitations of the existing multiple instance learning approaches to medical image analysis. Our model consists of four steps. i) Extracting the representations by a simple convolutional neural network using contrastive learning for training. ii) Using three different kernel functions to obtain the importance of each instance from the entire image and forming an attention map. iii) Based on the attention map, aggregating the entire image together by attention-based MIL pooling. iv) Feeding the results into the classifier for prediction. The results on different datasets demonstrate that the proposed model outperforms state-of-the-art methods on binary and weakly supervised classification tasks. It can provide more efficient classification results for various disease models and additional explanatory information.

7.
Comput Intell Neurosci ; 2023: 5960764, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36926186

RESUMO

Computational models of emotions can not only improve the effectiveness and efficiency of human-robot interaction but also coordinate a robot to adapt to its environment better. When designing computational models of emotions for socially interactive robots, especially for robots for people with special needs such as autistic children, one should take into account the social and communicative characteristics of such groups of people. This article presents a novel computational model of emotions called AppraisalCloudPCT that is suitable for socially interactive robots that can be adopted in autistic rehabilitation which, to the best of our knowledge, is the first computational model of emotions built for robots that can satisfy the needs of a special group of people such as autistic children. To begin with, some fundamental and notable computational models of emotions (e.g., OCC, Scherer's appraisal theory, PAD) that have deep and profound influence on building some significant models (e.g., PRESENCE, iGrace, xEmotion) for socially interactive robots are revisited. Then, a comparative assessment between our AppraisalCloudPCT and other five significant models for socially interactive robots is conducted. Great efforts have been made in building our proposed model to meet all of the six criteria for comparison, by adopting the appraisal theories on emotions, perceptual control theory on emotions, a component model view of appraisal models, and cloud robotics. Details of how to implement our model in a socially interactive robot we developed for autistic rehabilitation are also elaborated in this article. Future studies should examine how our model performs in different robots and also in more interactive scenarios.


Assuntos
Transtorno Autístico , Robótica , Criança , Humanos , Emoções , Comunicação , Simulação por Computador
8.
Appl Intell (Dordr) ; 53(12): 15188-15203, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36405345

RESUMO

As a fundamental problem in algorithmic trading, portfolio optimization aims to maximize the cumulative return by continuously investing in various financial derivatives within a given time period. Recent years have witnessed the transformation from traditional machine learning trading algorithms to reinforcement learning algorithms due to their superior nature of sequential decision making. However, the exponential growth of the imperfect and noisy financial data that is supposedly leveraged by the deterministic strategy in reinforcement learning, makes it increasingly challenging for one to continuously obtain a profitable portfolio. Thus, in this work, we first reconstruct several deterministic and stochastic reinforcement algorithms as benchmarks. On this basis, we introduce a risk-aware reward function to balance the risk and return. Importantly, we propose a novel interpretable stochastic reinforcement learning framework which tailors a stochastic policy parameterized by Gaussian Mixtures and a distributional critic realized by quantiles for the problem of portfolio optimization. In our experiment, the proposed algorithm demonstrates its superior performance on U.S. market stocks with a 63.1% annual rate of return while at the same time reducing the market value max drawdown by 10% when back-testing during the stock market crash around March 2020.

9.
Artigo em Inglês | MEDLINE | ID: mdl-36096444

RESUMO

As the most pervasive epigenetic marker present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation has been shown to participate in essential biological processes. Recent studies have revealed the distinct patterns of m6A methylome across human tissues, and a major challenge remains in elucidating the tissue-specific presence and circuitry of m6A methylation. We present here a comprehensive online platform m6A-TSHub for unveiling the context-specific m6A methylation and genetic mutations that potentially regulate m6A epigenetic mark. m6A-TSHub consists of four core components, including: (1) m6A-TSDB, a comprehensive database of 184,554 functionally annotated m6A sites derived from 23 human tissues and 499,369 m6A sites from 25 tumor conditions, respectively; (2) m6A-TSFinder, a web server for high-accuracy prediction of m6A methylation sites within a specific tissue from RNA sequences, which was constructed using multi-instance deep neural networks with gated attention; (3) m6A-TSVar, a web server for assessing the impact of genetic variants on tissue-specific m6A RNA modifications; and (4) m6A-CAVar, a database of 587,983 The Cancer Genome Atlas (TCGA) cancer mutations (derived from 27 cancer types) that were predicted to affect m6A modifications in the primary tissue of cancers. The database should make a useful resource for studying the m6A methylome and the genetic factors of epitranscriptome disturbance in a specific tissue (or cancer type). m6A-TSHub is accessible at www.xjtlu.edu.cn/biologicalsciences/m6ats.

10.
Nucleic Acids Res ; 50(18): 10290-10310, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-36155798

RESUMO

As the most pervasive epigenetic mark present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3'UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m6A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m6A sites, and improves m6A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m6A but also N1-methyladenosine (m1A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts.


Assuntos
Aprendizado Profundo , RNA Longo não Codificante , Regiões 3' não Traduzidas , Metilação , Isoformas de Proteínas/genética , RNA/genética , RNA/metabolismo , RNA Mensageiro/genética
11.
Comput Intell Neurosci ; 2022: 9213526, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35528364

RESUMO

Traditional training methods such as card teaching, assistive technologies (e.g., augmented reality/virtual reality games and smartphone apps), DVDs, human-computer interactions, and human-robot interactions are widely applied in autistic rehabilitation training in recent years. In this article, we propose a novel framework for human-computer/robot interaction and introduce a preliminary intervention study for improving the emotion recognition of Chinese children with an autism spectrum disorder. The core of the framework is the Facial Emotion Cognition and Training System (FECTS, including six tasks to train children with ASD to match, infer, and imitate the facial expressions of happiness, sadness, fear, and anger) based on Simon Baron-Cohen's E-S (empathizing-systemizing) theory. Our system may be implemented on PCs, smartphones, mobile devices such as PADs, and robots. The training record (e.g., a tracked record of emotion imitation) of the Chinese autistic children interacting with the device implemented using our FECTS will be uploaded and stored in the database of a cloud-based evaluation system. Therapists and parents can access the analysis of the emotion learning progress of these autistic children using the cloud-based evaluation system. Deep-learning algorithms of facial expressions recognition and attention analysis will be deployed in the back end (e.g., devices such as a PC, a robotic system, or a cloud system) implementing our FECTS, which can perform real-time tracking of the imitation quality and attention of the autistic children during the expression imitation phase. In this preliminary clinical study, a total of 10 Chinese autistic children aged 3-8 are recruited, and each of them received a single 20-minute training session every day for four consecutive days. Our preliminary results validated the feasibility of the developed FECTS and the effectiveness of our algorithms based on Chinese children with an autism spectrum disorder. To verify that our FECTS can be further adapted to children from other countries, children with different cultural/sociological/linguistic contexts should be recruited in future studies.


Assuntos
Transtorno do Espectro Autista , Criança , Pré-Escolar , China , Cognição , Emoções , Expressão Facial , Humanos
12.
Nucleic Acids Res ; 50(D1): D196-D203, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34986603

RESUMO

5-Methylcytosine (m5C) is one of the most prevalent covalent modifications on RNA. It is known to regulate a broad variety of RNA functions, including nuclear export, RNA stability and translation. Here, we present m5C-Atlas, a database for comprehensive collection and annotation of RNA 5-methylcytosine. The database contains 166 540 m5C sites in 13 species identified from 5 base-resolution epitranscriptome profiling technologies. Moreover, condition-specific methylation levels are quantified from 351 RNA bisulfite sequencing samples gathered from 22 different studies via an integrative pipeline. The database also presents several novel features, such as the evolutionary conservation of a m5C locus, its association with SNPs, and any relevance to RNA secondary structure. All m5C-atlas data are accessible through a user-friendly interface, in which the m5C epitranscriptomes can be freely explored, shared, and annotated with putative post-transcriptional mechanisms (e.g. RBP intermolecular interaction with RNA, microRNA interaction and splicing sites). Together, these resources offer unprecedented opportunities for exploring m5C epitranscriptomes. The m5C-Atlas database is freely accessible at https://www.xjtlu.edu.cn/biologicalsciences/m5c-atlas.


Assuntos
Bases de Dados Genéticas , Epigenoma/genética , Software , Transcriptoma/genética , 5-Metilcitosina/química , 5-Metilcitosina/metabolismo , Humanos , MicroRNAs/genética , Polimorfismo de Nucleotídeo Único/genética , Processamento Pós-Transcricional do RNA/genética , Análise de Sequência de RNA
13.
Bioinformatics ; 37(Suppl_1): i222-i230, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252943

RESUMO

MOTIVATION: Increasing evidence suggests that post-transcriptional ribonucleic acid (RNA) modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches for predicting RNA modifications have been developed, most of which were based on strong supervision enabled by base-resolution epitranscriptome data. However, high-resolution data may not be available. RESULTS: We propose WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their respective sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. WeakRM outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution. AVAILABILITY IMPLEMENTATION: The source code for the WeakRM algorithm, along with the datasets used, are freely accessible at: https://github.com/daiyun02211/WeakRM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA , Software , Algoritmos , Ligação Proteica , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA , Aprendizado de Máquina Supervisionado
14.
Nat Commun ; 12(1): 4011, 2021 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-34188054

RESUMO

Recent studies suggest that epi-transcriptome regulation via post-transcriptional RNA modifications is vital for all RNA types. Precise identification of RNA modification sites is essential for understanding the functions and regulatory mechanisms of RNAs. Here, we present MultiRM, a method for the integrated prediction and interpretation of post-transcriptional RNA modifications from RNA sequences. Built upon an attention-based multi-label deep learning framework, MultiRM not only simultaneously predicts the putative sites of twelve widely occurring transcriptome modifications (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), but also returns the key sequence contents that contribute most to the positive predictions. Importantly, our model revealed a strong association among different types of RNA modifications from the perspective of their associated sequence contexts. Our work provides a solution for detecting multiple RNA modifications, enabling an integrated analysis of these RNA modifications, and gaining a better understanding of sequence-based RNA modification mechanisms.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , Processamento Pós-Transcricional do RNA/genética , RNA/química , RNA/genética , Sequência de Bases , Metilação de DNA/genética , Humanos
15.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33993206

RESUMO

Motivation N6-methyladenosine (m6A) is the most prevalent RNA modification on mRNAs and lncRNAs. Evidence increasingly demonstrates its crucial importance in essential molecular mechanisms and various diseases. With recent advances in sequencing techniques, tens of thousands of m6A sites are identified in a typical high-throughput experiment, posing a key challenge to distinguish the functional m6A sites from the remaining 'passenger' (or 'silent') sites. Results: We performed a comparative conservation analysis of the human and mouse m6A epitranscriptomes at single site resolution. A novel scoring framework, ConsRM, was devised to quantitatively measure the degree of conservation of individual m6A sites. ConsRM integrates multiple information sources and a positive-unlabeled learning framework, which integrated genomic and sequence features to trace subtle hints of epitranscriptome layer conservation. With a series validation experiments in mouse, fly and zebrafish, we showed that ConsRM outperformed well-adopted conservation scores (phastCons and phyloP) in distinguishing the conserved and unconserved m6A sites. Additionally, the m6A sites with a higher ConsRM score are more likely to be functionally important. An online database was developed containing the conservation metrics of 177 998 distinct human m6A sites to support conservation analysis and functional prioritization of individual m6A sites. And it is freely accessible at: https://www.xjtlu.edu.cn/biologicalsciences/con.


Assuntos
Processamento Pós-Transcricional do RNA , RNA Mensageiro/genética , Análise de Sequência de RNA , Software , Transcriptoma , Animais , Humanos , Camundongos , RNA Mensageiro/biossíntese , Peixe-Zebra
16.
Nucleic Acids Res ; 49(D1): D134-D143, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-32821938

RESUMO

N 6-Methyladenosine (m6A) is the most prevalent RNA modification on mRNAs and lncRNAs. It plays a pivotal role during various biological processes and disease pathogenesis. We present here a comprehensive knowledgebase, m6A-Atlas, for unraveling the m6A epitranscriptome. Compared to existing databases, m6A-Atlas features a high-confidence collection of 442 162 reliable m6A sites identified from seven base-resolution technologies and the quantitative (rather than binary) epitranscriptome profiles estimated from 1363 high-throughput sequencing samples. It also offers novel features, such as; the conservation of m6A sites among seven vertebrate species (including human, mouse and chimp), the m6A epitranscriptomes of 10 virus species (including HIV, KSHV and DENV), the putative biological functions of individual m6A sites predicted from epitranscriptome data, and the potential pathogenesis of m6A sites inferred from disease-associated genetic mutations that can directly destroy m6A directing sequence motifs. A user-friendly graphical user interface was constructed to support the query, visualization and sharing of the m6A epitranscriptomes annotated with sites specifying their interaction with post-transcriptional machinery (RBP-binding, microRNA interaction and splicing sites) and interactively display the landscape of multiple RNA modifications. These resources provide fresh opportunities for unraveling the m6A epitranscriptomes. m6A-Atlas is freely accessible at: www.xjtlu.edu.cn/biologicalsciences/atlas.


Assuntos
Adenosina/análogos & derivados , Bases de Conhecimento , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Transcriptoma , Adenosina/metabolismo , Animais , Arabidopsis/genética , Arabidopsis/metabolismo , Atlas como Assunto , Conjuntos de Dados como Assunto , Vírus da Dengue/genética , Vírus da Dengue/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , HIV/genética , HIV/metabolismo , Herpesvirus Humano 8/genética , Herpesvirus Humano 8/metabolismo , Humanos , Camundongos , MicroRNAs/metabolismo , Pan troglodytes/genética , Pan troglodytes/metabolismo , RNA Longo não Codificante/metabolismo , RNA Mensageiro/metabolismo , Ratos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Suínos , Peixe-Zebra
17.
Nucleic Acids Res ; 49(D1): D1396-D1404, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33010174

RESUMO

Deciphering the biological impacts of millions of single nucleotide variants remains a major challenge. Recent studies suggest that RNA modifications play versatile roles in essential biological mechanisms, and are closely related to the progression of various diseases including multiple cancers. To comprehensively unveil the association between disease-associated variants and their epitranscriptome disturbance, we built RMDisease, a database of genetic variants that can affect RNA modifications. By integrating the prediction results of 18 different RNA modification prediction tools and also 303,426 experimentally-validated RNA modification sites, RMDisease identified a total of 202,307 human SNPs that may affect (add or remove) sites of eight types of RNA modifications (m6A, m5C, m1A, m5U, Ψ, m6Am, m7G and Nm). These include 4,289 disease-associated variants that may imply disease pathogenesis functioning at the epitranscriptome layer. These SNPs were further annotated with essential information such as post-transcriptional regulations (sites for miRNA binding, interaction with RNA-binding proteins and alternative splicing) revealing putative regulatory circuits. A convenient graphical user interface was constructed to support the query, exploration and download of the relevant information. RMDisease should make a useful resource for studying the epitranscriptome impact of genetic variants via multiple RNA modifications with emphasis on their potential disease relevance. RMDisease is freely accessible at: www.xjtlu.edu.cn/biologicalsciences/rmd.


Assuntos
Bases de Dados Genéticas , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Processamento Pós-Transcricional do RNA , RNA Neoplásico/genética , Processamento Alternativo , Humanos , Internet , MicroRNAs/genética , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Neoplasias/metabolismo , Neoplasias/patologia , Polimorfismo de Nucleotídeo Único , RNA Neoplásico/classificação , RNA Neoplásico/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Software , Transcriptoma
18.
Bioinformatics ; 37(9): 1285-1291, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-33135046

RESUMO

MOTIVATION: The distribution of biological features strongly indicates their functional relevance. Compared to DNA-related features, deciphering the distribution of mRNA-related features is non-trivial due to the existence of isoform ambiguity and compositional diversity of mRNAs. RESULTS: We propose here a rigorous statistical framework, MetaTX, for deciphering the distribution of mRNA-related features. Through a standardized mRNA model, MetaTX firstly unifies various mRNA transcripts of diverse compositions, and then corrects the isoform ambiguity by incorporating the overall distribution pattern of the features through an EM algorithm. MetaTX was tested on both simulated and real data. Results suggested that MetaTX substantially outperformed existing direct methods on simulated datasets, and that a more informative distribution pattern was produced for all the three datasets tested, which contain N6-Methyladenosine sites generated by different technologies. MetaTX should make a useful tool for studying the distribution and functions of mRNA-related biological features, especially for mRNA modifications such as N6-Methyladenosine. AVAILABILITY AND IMPLEMENTATION: The MetaTX R package is freely available at GitHub: https://github.com/yue-wang-biomath/MetaTX.1.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Isoformas de Proteínas/genética , RNA Mensageiro/genética
19.
Front Genet ; 11: 585029, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33329723

RESUMO

The discovery of cancer of unknown primary (CUP) is of great significance in designing more effective treatments and improving the diagnostic efficiency in cancer patients. In the study, we develop an appropriate machine learning model for tracing the tissue of origin of CUP with high accuracy after feature engineering and model evaluation. Based on a copy number variation data consisting of 4,566 training cases and 1,262 independent validation cases, an XGBoost classifier is applied to 10 types of cancer. Extremely randomized tree (Extra tree) is used for dimension reduction so that fewer variables replace the original high-dimensional variables. Features with top 300 weights are selected and principal component analysis is applied to eliminate noise. We find that XGBoost classifier achieves the highest overall accuracy of 0.8913 in the 10-fold cross-validation for training samples and 0.7421 on independent validation datasets for predicting tumor tissue of origin. Furthermore, by contrasting various performance indices, such as precision and recall rate, the experimental results show that XGBoost classifier significantly improves the classification performance of various tumors with less prediction error, as compared to other classifiers, such as K-nearest neighbors (KNN), Bayes, support vector machine (SVM), and Adaboost. Our method can infer tissue of origin for the 10 cancer types with acceptable accuracy in both cross-validation and independent validation data. It may be used as an auxiliary diagnostic method to determine the actual clinicopathological status of specific cancer.

20.
Evol Bioinform Online ; 16: 1176934320915707, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32733123

RESUMO

RNA N 6-methyladenosine (m6A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m6A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m6A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m6A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m6A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA