Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 188
Filtrar
1.
J Comput Biol ; 29(1): 27-44, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35050715

RESUMO

We propose GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single-cell RNA-Sequencing (scRNA-Seq) data. Our framework incorporates two intertwined models. First, we leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresponding genes they regulate, by developing a multitask learning framework. Second, to capture sparsity of GRNs observed in the real world, we design an unrolled algorithm technique for our framework. Our deep architecture requires supervision for training, for which we repurpose existing synthetic data simulators that generate scRNA-Seq data guided by an underlying GRN. Experimental results demonstrate that GRNUlar outperforms state-of-the-art methods on both synthetic and real data sets. Our study also demonstrates the novel and successful use of expression data simulators for supervised learning of GRN inference.


Assuntos
Aprendizado Profundo , Redes Reguladoras de Genes , Análise de Célula Única/estatística & dados numéricos , Algoritmos , Animais , Viés , Biologia Computacional , Simulação por Computador , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Escherichia coli/genética , Humanos , Camundongos , Redes Neurais de Computação , RNA-Seq/estatística & dados numéricos , Saccharomyces cerevisiae/genética , Aprendizado de Máquina Supervisionado
2.
J Comput Biol ; 29(1): 23-26, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35020490

RESUMO

scDesign2 is a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. This article shows how to download and install the scDesign2 R package, how to fit probabilistic models (one per cell type) to real data and simulate synthetic data from the fitted models, and how to use scDesign2 to guide experimental design and benchmark computational methods. Finally, a note is given about cell clustering as a preprocessing step before model fitting and data simulation.


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Célula Única/estatística & dados numéricos , Software , Algoritmos , Animais , Análise por Conglomerados , Biologia Computacional , Simulação por Computador , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Expressão Gênica , Camundongos , Modelos Estatísticos , RNA-Seq/estatística & dados numéricos
3.
J Comput Biol ; 29(2): 121-139, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35041494

RESUMO

Current expression quantification methods suffer from a fundamental but undercharacterized type of error: the most likely estimates for transcript abundances are not unique. This means multiple estimates of transcript abundances generate the observed RNA-seq reads with equal likelihood, and the underlying true expression cannot be determined. This is called nonidentifiability in probabilistic modeling. It is further exacerbated by incomplete reference transcriptomes where reads may be sequenced from unannotated transcripts. Graph quantification is a generalization to transcript quantification, accounting for the reference incompleteness by allowing exponentially many unannotated transcripts to express reads. We propose methods to calculate a "confidence range of expression" for each transcript, representing its possible abundance across equally optimal estimates for both quantification models. This range informs both whether a transcript has potential estimation error due to nonidentifiability and the extent of the error. Applying our methods to the Human Body Map data, we observe that 35%-50% of transcripts potentially suffer from inaccurate quantification caused by nonidentifiability. When comparing the expression between isoforms in one sample, we find that the degree of inaccuracy of 20%-47% transcripts can be so large that the ranking of expression between the transcript and other isoforms from the same gene cannot be determined. When comparing the expression of a transcript between two groups of RNA-seq samples in differential expression analysis, we observe that the majority of detected differentially expressed transcripts are reliable with a few exceptions after considering the ranges of the optimal expression estimates.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/estatística & dados numéricos , Transcriptoma , Processamento Alternativo , Biologia Computacional , Intervalos de Confiança , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Humanos , Modelos Estatísticos , RNA-Seq/estatística & dados numéricos
4.
Comput Math Methods Med ; 2021: 7029130, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34737790

RESUMO

Tumor recurrence and metastasis often occur in HCC patients after surgery, and the prognosis is not optimistic. Hence, searching effective biomarkers for prognosis of is of great importance. Firstly, HCC-related data was acquired from the TCGA and GEO databases. Based on GEO data, 256 differentially expressed genes (DEGs) were obtained firstly. Subsequently, to clarify function of DEGs, clusterProfiler package was used to conduct functional enrichment analyses on DEGs. Protein-protein interaction (PPI) network analysis screened 20 key genes. The key genes were filtered via GEPIA database, by which 11 hub genes (F9, CYP3A4, ASPM, AURKA, CDC20, CDCA5, NCAP, PRC1, PTTG1, TOP2A, and KIFC1) were screened out. Then, univariate Cox analysis was applied to construct a prognostic model, followed by a prediction performance validation. With the risk score calculated by the model and common clinical features, univariate and multivariate analyses were carried out to assess whether the prognostic model could be used independently for prognostic prediction. In conclusion, the current study screened HCC prognostic gene signature based on public databases.


Assuntos
Biomarcadores Tumorais/genética , Carcinoma Hepatocelular/genética , Redes Reguladoras de Genes , Neoplasias Hepáticas/genética , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Regulação Neoplásica da Expressão Gênica , Ontologia Genética , Humanos , Análise Multivariada , Prognóstico , Modelos de Riscos Proporcionais , Mapas de Interação de Proteínas/genética
5.
Int J Mol Sci ; 22(21)2021 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-34768960

RESUMO

Deep learning has proven advantageous in solving cancer diagnostic or classification problems. However, it cannot explain the rationale behind human decisions. Biological pathway databases provide well-studied relationships between genes and their pathways. As pathways comprise knowledge frameworks widely used by human researchers, representing gene-to-pathway relationships in deep learning structures may aid in their comprehension. Here, we propose a deep neural network (PathDeep), which implements gene-to-pathway relationships in its structure. We also provide an application framework measuring the contribution of pathways and genes in deep neural networks in a classification problem. We applied PathDeep to classify cancer and normal tissues based on the publicly available, large gene expression dataset. PathDeep showed higher accuracy than fully connected neural networks in distinguishing cancer from normal tissues (accuracy = 0.994) in 32 tissue samples. We identified 42 pathways related to 32 cancer tissues and 57 associated genes contributing highly to the biological functions of cancer. The most significant pathway was G-protein-coupled receptor signaling, and the most enriched function was the G1/S transition of the mitotic cell cycle, suggesting that these biological functions were the most common cancer characteristics in the 32 tissues.


Assuntos
Aprendizado Profundo , Neoplasias/classificação , Neoplasias/genética , RNA-Seq/estatística & dados numéricos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Diagnóstico por Computador , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Neoplasias/diagnóstico , Redes Neurais de Computação
6.
Comput Math Methods Med ; 2021: 6015473, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34603484

RESUMO

Hypoxic ischemic encephalopathy (HIE) is classified as a sort of serious nervous system syndrome that occurs in the early life period. Noncoding RNAs had been confirmed to have crucial roles in human diseases. So far, there were few systematical and comprehensive studies towards the expression profile of RNAs in the brain after hypoxia ischemia. In this study, 31 differentially expressed microRNAs (miRNAs) with upregulation were identified. In addition, 5512 differentially expressed mRNAs, long noncoding RNAs (lncRNAs), and circular RNAs (circRNAs) were identified in HIE groups. Bioinformatics analysis showed these circRNAs and mRNAs were significantly enriched in regulation of leukocyte activation, response to virus, and neutrophil degranulation. Pathway and its related gene network analysis indicated that HLA - DPA1, HLA - DQA2, HLA - DQB1, and HLA - DRB4 have a more crucial role in HIE. Finally, miRNA-circRNA-mRNA interaction network analysis was also performed to identify hub miRNAs and circRNAs. We found that miR-592 potentially targeting 5 circRNAs, thus affecting 15 mRNA expressions in HIR. hsa_circ_0068397 and hsa_circ_0045698 were identified as hub circRNAs in HIE. Collectively, using RNA-seq, bioinformatics analysis, and circRNA/miRNA interaction prediction, we systematically investigated the differentially expressed RNAs in HIE, which could give a new hint of understanding the pathogenesis of HIE.


Assuntos
Redes Reguladoras de Genes , Hipóxia-Isquemia Encefálica/genética , MicroRNAs/genética , RNA Circular/genética , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Regulação para Baixo , Perfilação da Expressão Gênica/estatística & dados numéricos , Antígenos HLA-D/genética , Humanos , Hipóxia-Isquemia Encefálica/imunologia , Fenômenos Imunogenéticos , RNA Mensageiro/genética , RNA-Seq , Regulação para Cima
7.
Comput Math Methods Med ; 2021: 8020879, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34603485

RESUMO

BACKGROUND: The competitive endogenous RNA (ceRNA) mechanism has been discovered recently and regulating cancer-related gene expressions. The ceRNA network participates in multiple processes, such as cell proliferation and metastasis, and potentially drives the progression of cancer. In this study, we focus on the ceRNA networks of esophageal squamous cell carcinoma and discovered a novel biomarker panel for cancer prognosis. METHODS: RNA expression data of esophageal carcinoma from the TCGA database were achieved and constructed ceRNA network in esophageal carcinoma using R packages. RESULTS: Four miRNAs were discovered as the core of the ceRNA model, including miR-93, miR-191, miR-99b, and miR-3615. Moreover, we constructed a ceRNA network in esophageal carcinoma, which included 4 miRNAs and 6 lncRNAs. After ceRNA network modeling, we investigated six lncRNAs which could be taken together as a panel for prognosis prediction of esophageal cancer, including LINC02575, LINC01087, LINC01816, AL136162.1, AC012073.1, and AC117402.1. Finally, we tested the predictive power of the panel in all TCGA samples. CONCLUSIONS: Our study discovered a new biomarker panel which may have potential values in the prediction of prognosis of esophageal carcinoma.


Assuntos
Neoplasias Esofágicas/genética , Carcinoma de Células Escamosas do Esôfago/genética , RNA Longo não Codificante/genética , Biomarcadores Tumorais/genética , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , MicroRNAs/genética , Modelos Genéticos , Prognóstico , RNA Mensageiro/genética , RNA-Seq
8.
Comput Math Methods Med ; 2021: 7764764, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34484416

RESUMO

As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although high-throughput experimental methods can locate m7G sites precisely, they are overpriced and time-consuming. Hence, it is imperative to design an efficient computational method that can accurately identify the m7G sites. In this study, we propose a novel method via incorporating BERT-based multilingual model in bioinformatics to represent the information of RNA sequences. Firstly, we treat RNA sequences as natural sentences and then employ bidirectional encoder representations from transformers (BERT) model to transform them into fixed-length numerical matrices. Secondly, a feature selection scheme based on the elastic net method is constructed to eliminate redundant features and retain important features. Finally, the selected feature subset is input into a stacking ensemble classifier to predict m7G sites, and the hyperparameters of the classifier are tuned with tree-structured Parzen estimator (TPE) approach. By 10-fold cross-validation, the performance of BERT-m7G is measured with an ACC of 95.48% and an MCC of 0.9100. The experimental results indicate that the proposed method significantly outperforms state-of-the-art prediction methods in the identification of m7G modifications.


Assuntos
Algoritmos , Guanosina/análogos & derivados , Processamento Pós-Transcricional do RNA/genética , Sequência de Bases , Sítios de Ligação/genética , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Aprendizado Profundo , Guanosina/genética , Guanosina/metabolismo , Humanos , Modelos Lineares
9.
PLoS Comput Biol ; 17(8): e1008904, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34339413

RESUMO

The killer-cell immunoglobulin-like receptor (KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies.


Assuntos
Imunogenética/estatística & dados numéricos , Receptores KIR/genética , África Austral , Alelos , Biologia Computacional , Simulação por Computador , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Europa (Continente) , Dosagem de Genes , Genética Populacional/estatística & dados numéricos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Polimorfismo Genético , Receptores KIR/classificação , Alinhamento de Sequência/estatística & dados numéricos , Design de Software
10.
Comput Math Methods Med ; 2021: 1835056, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34306171

RESUMO

In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed CNN, CNN-LSTM, and CNN-Bidirectional LSTM architectures using Label and K-mer encoding for DNA sequence classification. The models are evaluated on different classification metrics. From the experimental results, the CNN and CNN-Bidirectional LSTM with K-mer encoding offers high accuracy with 93.16% and 93.13%, respectively, on testing data.


Assuntos
COVID-19/virologia , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Redes Neurais de Computação , SARS-CoV-2/genética , Análise de Sequência de DNA/estatística & dados numéricos , Sequência de Bases , Biologia Computacional , DNA Viral/classificação , DNA Viral/genética , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Aprendizado Profundo , Humanos , Pandemias , SARS-CoV-2/classificação
11.
PLoS Comput Biol ; 17(6): e1009078, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34153026

RESUMO

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).


Assuntos
Mapeamento de Sequências Contíguas/estatística & dados numéricos , Alinhamento de Sequência/estatística & dados numéricos , Software , Análise por Conglomerados , Biologia Computacional , Simulação por Computador , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Variação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Programação Linear , Análise de Sequência de DNA
12.
PLoS Comput Biol ; 17(6): e1009118, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34138847

RESUMO

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.


Assuntos
RNA-Seq/estatística & dados numéricos , Análise de Célula Única/estatística & dados numéricos , Software , Animais , Análise por Conglomerados , Biologia Computacional , Simulação por Computador , Interpretação Estatística de Dados , Visualização de Dados , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Perfilação da Expressão Gênica/estatística & dados numéricos , Técnicas Genéticas/estatística & dados numéricos , Humanos , RNA Mensageiro/genética , RNA Mensageiro/isolamento & purificação
13.
PLoS Comput Biol ; 17(6): e1009064, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34077420

RESUMO

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.


Assuntos
Genômica/estatística & dados numéricos , Aprendizado de Máquina , Software , Animais , Córtex Cerebral/metabolismo , Análise por Conglomerados , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Células Dendríticas/metabolismo , Humanos , Teoria da Informação , Camundongos , RNA Citoplasmático Pequeno/genética , RNA-Seq , Análise de Célula Única/estatística & dados numéricos
14.
Commun Biol ; 4(1): 660, 2021 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-34079055

RESUMO

The female mammary epithelium undergoes reorganization during development, pregnancy, and menopause, linking higher risk with breast cancer development. To characterize these periods of complex remodeling, here we report integrated 50 K mouse and 24 K human mammary epithelial cell atlases obtained by single-cell RNA sequencing, which covers most lifetime stages. Our results indicate a putative trajectory that originates from embryonic mammary stem cells which differentiates into three epithelial lineages (basal, luminal hormone-sensing, and luminal alveolar), presumably arising from unipotent progenitors in postnatal glands. The lineage-specific genes infer cells of origin of breast cancer using The Cancer Genome Atlas data and single-cell RNA sequencing of human breast cancer, as well as the association of gland reorganization to different breast cancer subtypes. This comprehensive mammary cell gene expression atlas ( https://mouse-mammary-epithelium-integrated.cells.ucsc.edu ) presents insights into the impact of the internal and external stimuli on the mammary epithelium at an advanced resolution.


Assuntos
Neoplasias da Mama/etiologia , Mama/citologia , Mama/metabolismo , Glândulas Mamárias Animais/citologia , Glândulas Mamárias Animais/metabolismo , Neoplasias Mamárias Experimentais/etiologia , Animais , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Carcinogênese/genética , Linhagem da Célula/genética , Transformação Celular Neoplásica/genética , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Células Epiteliais/citologia , Células Epiteliais/metabolismo , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias Mamárias Experimentais/genética , Neoplasias Mamárias Experimentais/patologia , Camundongos , Camundongos Endogâmicos BALB C , Gravidez , RNA-Seq/estatística & dados numéricos
15.
Comput Math Methods Med ; 2021: 6680211, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33747117

RESUMO

Atrial fibrillation (AF) is one of the most common supraventricular arrhythmias worldwide. However, the specific molecular mechanism underlying AF remains unclear. Our study is aimed at identifying pivotal microRNAs (miRNAs) and targeting genes associated with persistent AF (pAF) using bioinformatics analysis. Three gene expression array datasets (GSE31821, GSE41177, and GSE79768) and an miRNA expression array dataset (GSE68475) associated with pAF were downloaded. Differentially expressed genes (DEGs) were identified using the LIMMA package, and differentially expressed miRNAs (DEMs) were screened from GSE68475. Target genes for DEMs were predicted using the miRTarBase database, and intersections between these target genes and DEGs were selected for further analysis, including the generation of protein-protein interaction (PPI) network, miRNA-transcription factor-target regulatory network, and drug-gene network. A total of 264 DEGs and 40 DEMs were identified between the pAF and control groups. Functional and pathway enrichment analyses of up- and downregulated DEGs were performed. The common genes (CGs) were primarily enriched in the phosphoinositide 3-kinase- (PI3K-) protein kinase B (Akt) signaling pathway, negative regulation of cell division, and response to hypoxia. The PPI network, miRNA-transcription factor-target regulatory network, and drug-gene network were constructed using Cytoscape. The present study revealed several novel miRNAs and genes involved in pAF. We speculated that miR-4298, miR-3125, miR-4306, and miR-671-5p could represent significant miRNAs that act on the target gene superoxide dismutase 2 (SOD2) during the development of pAF and may serve as essential biomarkers for pAF diagnosis and treatment. Moreover, MYC might function in pAF pathogenesis through the PI3K-Akt signaling pathway.


Assuntos
Fibrilação Atrial/genética , MicroRNAs/genética , Fibrilação Atrial/metabolismo , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Genes myc , Humanos , MicroRNAs/metabolismo , Miocárdio/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Mapas de Interação de Proteínas/genética , Proteínas Proto-Oncogênicas c-akt/metabolismo , Transdução de Sinais/genética , Superóxido Dismutase/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
16.
Comput Math Methods Med ; 2021: 6691096, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33680070

RESUMO

Preeclampsia (PE) is a maternal disease that causes maternal and child death. Treatment and preventive measures are not sound enough. The problem of PE screening has attracted much attention. The purpose of this study is to screen placental mRNA to obtain the best PE biomarkers for identifying patients with PE. We use Limma in the R language to screen out the 48 differentially expressed genes with the largest differences and used correlation-based feature selection algorithms to reduce the dimensionality and avoid attribute redundancy arising from too many mRNA samples participating in the classification. After reducing the mRNA attributes, the mRNA samples are sorted from large to small according to information gain. In this study, a classifier model is designed to identify whether samples had PE through mRNA in the placenta. To improve the accuracy of classification and avoid overfitting, three classifiers, including C4.5, AdaBoost, and multilayer perceptron, are used. We use the majority voting strategy integrated with the differentially expressed genes and the genes filtered by the best subset method as comparison methods to train the classifier. The results show that the classification accuracy rate has increased from 79% to 82.2%, and the number of mRNA features has decreased from 48 to 13. This study provides clues for the main PE biomarkers of mRNA in the placenta and provides ideas for the treatment and screening of PE.


Assuntos
Aprendizado de Máquina , Placenta/metabolismo , Pré-Eclâmpsia/diagnóstico , Pré-Eclâmpsia/genética , RNA Mensageiro/genética , Algoritmos , Biomarcadores/metabolismo , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Árvores de Decisões , Diagnóstico por Computador , Feminino , Marcadores Genéticos , Testes Genéticos , Humanos , Redes Neurais de Computação , Gravidez , RNA Mensageiro/metabolismo , Transcriptoma
17.
Comput Math Methods Med ; 2021: 6636350, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33488763

RESUMO

A promoter is a short DNA sequence near to the start codon, responsible for initiating transcription of a specific gene in genome. The accurate recognition of promoters has great significance for a better understanding of the transcriptional regulation. Because of their importance in the process of biological transcriptional regulation, there is an urgent need to develop in silico tools to identify promoters and their types timely and accurately. A number of prediction methods had been developed in this regard; however, almost all of them were merely used for identifying promoters and their strength or sigma types. Owing to that TATA box region in TATA promoter that influences posttranscriptional processes, in the current study, we developed a two-layer predictor called iPTT(2L)-CNN by using the convolutional neural network (CNN) for identifying TATA and TATA-less promoters. The first layer can be used to identify a given DNA sequence as a promoter or nonpromoter. The second layer is used to identify whether the recognized promoter is TATA promoter or not. The 5-fold crossvalidation and independent testing results demonstrate that the constructed predictor is promising for identifying promoter and classifying TATA and TATA-less promoter. Furthermore, to make it easier for most experimental scientists get the results they need, a user-friendly web server has been established at http://www.jci-bioinfo.cn/iPPT(2L)-CNN.


Assuntos
Genoma de Planta , Redes Neurais de Computação , Regiões Promotoras Genéticas , Biologia Computacional , DNA de Plantas/genética , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Análise de Sequência de DNA , Especificidade da Espécie , TATA Box , Zea mays/genética
18.
Nucleic Acids Res ; 49(D1): D29-D37, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33245775

RESUMO

The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional/estatística & dados numéricos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Armazenamento e Recuperação da Informação/métodos , SARS-CoV-2/genética , Proteínas Virais/genética , COVID-19/epidemiologia , COVID-19/virologia , Biologia Computacional/métodos , Biologia Computacional/organização & administração , Bases de Dados de Ácidos Nucleicos/organização & administração , Saúde Global , Humanos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Internet , Pandemias , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Proteínas Virais/metabolismo
19.
Nucleic Acids Res ; 49(D1): D82-D85, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33175160

RESUMO

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), provided by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), has for almost forty years continued in its mission to freely archive and present the world's public sequencing data for the benefit of the entire scientific community and for the acceleration of the global research effort. Here we highlight the major developments to ENA services and content in 2020, focussing in particular on the recently released updated ENA browser, modernisation of our release process and our data coordination collaborations with specific research communities.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos/tendências , Ácidos Nucleicos/genética , Nucleotídeos/genética , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Europa (Continente) , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Anotação de Sequência Molecular , Ácidos Nucleicos/química , Nucleotídeos/química , Análise de Sequência de DNA , Análise de Sequência de RNA
20.
Comput Math Methods Med ; 2020: 2953598, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33204298

RESUMO

BACKGROUND: miR-139-5p is lowly expressed in various human cancers and exerts its antitumor effect through different molecular mechanisms, yet the molecular mechanism of miR-139-5p in lung adenocarcinoma (LUAD) remains to be further elucidated. The study is aimed at investigating the role and the regulatory mechanism of miR-139-5p in LUAD progression. METHODS: Differential analysis was performed on miRNA expression data in the TCGA-LUAD dataset. qRT-PCR was employed to detect the transcription levels of miR-139-5p and MAD2L1 in LUAD cells, while western blot was carried out for the detection of MAD2L1 protein expression. CCK-8 and Transwell assays were implemented to assess LUAD cell proliferation, migration, and invasion. A dual-luciferase reporter gene assay was conducted to verify the direct targeting relationship between miR-139-5p and MAD2L1. RESULTS: miR-139-5p was significantly downregulated in LUAD cells in comparison with that in human normal bronchial epithelial cells. Overexpressing miR-139-5p inhibited LUAD cell proliferation, migration, and invasion, while opposite results could be observed when miR-139-5p was inhibited. MAD2L1 was identified as a direct target of miR-139-5p in LUAD. Besides, the inhibitory effect of miR-139-5p overexpression on LUAD cell proliferation, migration, and invasion was attenuated by overexpressing MAD2L1. CONCLUSION: Our study suggests that miR-139-5p is lowly expressed in LUAD cells and inhibits LUAD cell proliferation, migration, and invasion by targeted suppressing MAD2L1 expression. It is of potential significance for the prognosis and treatment of LUAD.


Assuntos
Adenocarcinoma de Pulmão/genética , Neoplasias Pulmonares/genética , Proteínas Mad2/genética , MicroRNAs/genética , Células A549 , Adenocarcinoma de Pulmão/metabolismo , Adenocarcinoma de Pulmão/patologia , Linhagem Celular Tumoral , Movimento Celular/genética , Proliferação de Células/genética , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Regulação para Baixo , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patologia , Proteínas Mad2/metabolismo , MicroRNAs/metabolismo , Invasividade Neoplásica/genética , Invasividade Neoplásica/patologia , Regulação para Cima
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA