Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76.216
Filtrar
1.
Sci Rep ; 11(1): 3934, 2021 02 16.
Artículo en Inglés | MEDLINE | ID: mdl-33594223

RESUMEN

Accumulating evidence supports the high prevalence of co-infections among Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) patients, and their potential to worsen the clinical outcome of COVID-19. However, there are few data on Southern Hemisphere populations, and most studies to date have investigated a narrow spectrum of viruses using targeted qRT-PCR. Here we assessed respiratory viral co-infections among SARS-CoV-2 patients in Australia, through respiratory virome characterization. Nasopharyngeal swabs of 92 SARS-CoV-2-positive cases were sequenced using pan-viral hybrid-capture and the Twist Respiratory Virus Panel. In total, 8% of cases were co-infected, with rhinovirus (6%) or influenzavirus (2%). Twist capture also achieved near-complete sequencing (> 90% coverage, > tenfold depth) of the SARS-CoV-2 genome in 95% of specimens with Ct < 30. Our results highlight the importance of assessing all pathogens in symptomatic patients, and the dual-functionality of Twist hybrid-capture, for SARS-CoV-2 whole-genome sequencing without amplicon generation and the simultaneous identification of viral co-infections with ease.


Asunto(s)
/diagnóstico , Coinfección/diagnóstico , Coinfección/virología , Análisis de Secuencia de ADN , /genética , Australia/epidemiología , Coinfección/epidemiología , Biología Computacional , Genoma Viral , Humanos , Sistemas de Lectura Abierta/genética , Reproducibilidad de los Resultados , Secuenciación Completa del Genoma
2.
Cells ; 10(2)2021 02 04.
Artículo en Inglés | MEDLINE | ID: mdl-33557205

RESUMEN

Our knowledge of the evolution and the role of untranslated region (UTR) in SARS-CoV-2 pathogenicity is very limited. Leader sequence, originated from UTR, is found at the 5' ends of all encoded SARS-CoV-2 transcripts, highlighting its importance. Here, evolution of leader sequence was compared between human pathogenic and non-pathogenic coronaviruses. Then, profiling of microRNAs that can inactivate the key UTR regions of coronaviruses was carried out. A distinguished pattern of evolution in leader sequence of SARS-CoV-2 was found. Mining all available microRNA families against leader sequences of coronaviruses resulted in discovery of 39 microRNAs with a stable thermodynamic binding energy. Notably, SARS-CoV-2 had a lower binding stability against microRNAs. hsa-MIR-5004-3p was the only human microRNA able to target the leader sequence of SARS and to a lesser extent, also SARS-CoV-2. However, its binding stability decreased remarkably in SARS-COV-2. We found some plant microRNAs with low and stable binding energy against SARS-COV-2. Meta-analysis documented a significant (p < 0.01) decline in the expression of MIR-5004-3p after SARS-COV-2 infection in trachea, lung biopsy, and bronchial organoids as well as lung-derived Calu-3 and A549 cells. The paucity of the innate human inhibitory microRNAs to bind to leader sequence of SARS-CoV-2 can contribute to its high replication in infected human cells.


Asunto(s)
Regiones no Traducidas 5' , MicroARNs/genética , Replicación Viral , Animales , Biología Computacional , Evolución Molecular , Genoma Viral , Humanos , MicroARNs/farmacología , Conformación de Ácido Nucleico , ARN de Planta/farmacología , /fisiología
3.
BMC Bioinformatics ; 22(1): 38, 2021 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-33522898

RESUMEN

BACKGROUND: Due to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors. RESULTS: In this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN. CONCLUSIONS: DeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.


Asunto(s)
Cromatina , Redes Neurales de la Computación , Unión Proteica , Factores de Transcripción , Sitios de Unión , Biología Computacional , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
4.
BMC Bioinformatics ; 22(1): 37, 2021 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-33522913

RESUMEN

BACKGROUND: IsomiRs are miRNA variants that vary in length and/or sequence when compared to their canonical forms. These variants display differences in length and/or sequence, including additions or deletions of one or more nucleotides (nts) at the 5' and/or 3' end, internal editings or untemplated 3' end additions. Most available tools for small RNA-seq data analysis do not allow the identification of isomiRs and often require advanced knowledge of bioinformatics. To overcome this, we have developed IsomiR Window, a platform that supports the systematic identification, quantification and functional exploration of isomiR expression in small RNA-seq datasets, accessible to users with no computational skills. METHODS: IsomiR Window enables the discovery of isomiRs and identification of all annotated non-coding RNAs in RNA-seq datasets from animals and plants. It comprises two main components: the IsomiR Window pipeline for data processing; and the IsomiR Window Browser interface. It integrates over ten third-party softwares for the analysis of small-RNA-seq data and holds a new algorithm that allows the detection of all possible types of isomiRs. These include 3' and 5'end isomiRs, 3' end tailings, isomiRs with single nucleotide polymorphisms (SNPs) or potential RNA editings, as well as all possible fuzzy combinations. IsomiR Window includes all required databases for analysis and annotation, and is freely distributed as a Linux virtual machine, including all required software. RESULTS: IsomiR Window processes several datasets in an automated manner, without restrictions of input file size. It generates high quality interactive figures and tables which can be exported into different formats. The performance of isomiR detection and quantification was assessed using simulated small-RNA-seq data. For correctly mapped reads, it identified different types of isomiRs with high confidence and 100% accuracy. The analysis of a small RNA-seq data from Basal Cell Carcinomas (BCCs) using isomiR Window confirmed that miR-183-5p is up-regulated in Nodular BCCs, but revealed that this effect was predominantly due to a novel 5'end variant. This variant displays a different seed region motif and 1756 isoform-exclusive mRNA targets that are significantly associated with disease pathways, underscoring the biological relevance of isomiR-focused analysis. IsomiR Window is available at https://isomir.fc.ul.pt/ .


Asunto(s)
Biología Computacional , MicroARNs , RNA-Seq , Animales , ARN Mensajero , Análisis de Secuencia de ARN , Programas Informáticos
5.
Nat Commun ; 12(1): 764, 2021 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-33536417

RESUMEN

Genome-wide association studies (GWAS) have identified thousands of genomic regions affecting complex diseases. The next challenge is to elucidate the causal genes and mechanisms involved. One approach is to use statistical colocalization to assess shared genetic aetiology across multiple related traits (e.g. molecular traits, metabolic pathways and complex diseases) to identify causal pathways, prioritize causal variants and evaluate pleiotropy. We propose HyPrColoc (Hypothesis Prioritisation for multi-trait Colocalization), an efficient deterministic Bayesian algorithm using GWAS summary statistics that can detect colocalization across vast numbers of traits simultaneously (e.g. 100 traits can be jointly analysed in around 1 s). We perform a genome-wide multi-trait colocalization analysis of coronary heart disease (CHD) and fourteen related traits, identifying 43 regions in which CHD colocalized with ≥1 trait, including 5 previously unknown CHD loci. Across the 43 loci, we further integrate gene and protein expression quantitative trait loci to identify candidate causal genes.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Enfermedad Coronaria/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Sitios de Carácter Cuantitativo/genética , Enfermedad Coronaria/diagnóstico , Genómica/métodos , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Factores de Riesgo
6.
Nat Commun ; 12(1): 775, 2021 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-33536437

RESUMEN

Phenotypic plasticity, the ability to produce multiple phenotypes from a single genotype, represents an excellent model with which to examine the relationship between gene expression and phenotypes. Analyses of the molecular foundations of phenotypic plasticity are challenging, however, especially in the case of complex social phenotypes. Here we apply a machine learning approach to tackle this challenge by analyzing individual-level gene expression profiles of Polistes dominula paper wasps following the loss of a queen. We find that caste-associated gene expression profiles respond strongly to queen loss, and that this change is partly explained by attributes such as age but occurs even in individuals that appear phenotypically unaffected. These results demonstrate that large changes in gene expression may occur in the absence of outwardly detectable phenotypic changes, resulting here in a socially mediated de-differentiation of individuals at the transcriptomic level but not at the levels of ovarian development or behavior.


Asunto(s)
Adaptación Fisiológica/genética , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Conducta Social , Transcriptoma/genética , Avispas/genética , Algoritmos , Animales , Femenino , Ontología de Genes , Redes Reguladoras de Genes , Humanos , Aprendizaje Automático , Fenotipo
7.
BMC Bioinformatics ; 22(1): 60, 2021 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-33563206

RESUMEN

BACKGROUND: Current high-throughput technologies-i.e. whole genome sequencing, RNA-Seq, ChIP-Seq, etc.-generate huge amounts of data and their usage gets more widespread with each passing year. Complex analysis pipelines involving several computationally-intensive steps have to be applied on an increasing number of samples. Workflow management systems allow parallelization and a more efficient usage of computational power. Nevertheless, this mostly happens by assigning the available cores to a single or few samples' pipeline at a time. We refer to this approach as naive parallel strategy (NPS). Here, we discuss an alternative approach, which we refer to as concurrent execution strategy (CES), which equally distributes the available processors across every sample's pipeline. RESULTS: Theoretically, we show that the CES results, under loose conditions, in a substantial speedup, with an ideal gain range spanning from 1 to the number of samples. Also, we observe that the CES yields even faster executions since parallelly computable tasks scale sub-linearly. Practically, we tested both strategies on a whole exome sequencing pipeline applied to three publicly available matched tumour-normal sample pairs of gastrointestinal stromal tumour. The CES achieved speedups in latency up to 2-2.4 compared to the NPS. CONCLUSIONS: Our results hint that if resources distribution is further tailored to fit specific situations, an even greater gain in performance of multiple samples pipelines execution could be achieved. For this to be feasible, a benchmarking of the tools included in the pipeline would be necessary. It is our opinion these benchmarks should be consistently performed by the tools' developers. Finally, these results suggest that concurrent strategies might also lead to energy and cost savings by making feasible the usage of low power machine clusters.


Asunto(s)
Biología Computacional , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación del Exoma Completo , Secuenciación de Inmunoprecipitación de Cromatina , Biología Computacional/métodos , Secuenciación del Exoma Completo/normas , Flujo de Trabajo
8.
BMC Bioinformatics ; 22(1): 58, 2021 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-33563211

RESUMEN

BACKGROUND: Hub transcription factors, regulating many target genes in gene regulatory networks (GRNs), play important roles as disease regulators and potential drug targets. However, while numerous methods have been developed to predict individual regulator-gene interactions from gene expression data, few methods focus on inferring these hubs. RESULTS: We have developed ComHub, a tool to predict hubs in GRNs. ComHub makes a community prediction of hubs by averaging over predictions by a compendium of network inference methods. Benchmarking ComHub against the DREAM5 challenge data and two independent gene expression datasets showed a robust performance of ComHub over all datasets. CONCLUSIONS: In contrast to other evaluated methods, ComHub consistently scored among the top performing methods on data from different sources. Lastly, we implemented ComHub to work with both predefined networks and to perform stand-alone network inference, which will make the method generally applicable.


Asunto(s)
Algoritmos , Biología Computacional , Redes Reguladoras de Genes , Benchmarking , Biología Computacional/métodos , Expresión Génica , Factores de Transcripción
9.
BMC Bioinformatics ; 22(1): 59, 2021 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-33563213

RESUMEN

BACKGROUND: Long noncoding RNAs represent a large class of transcripts with two common features: they exceed an arbitrary length threshold of 200 nt and are assumed to not encode proteins. Although a growing body of evidence indicates that the vast majority of lncRNAs are potentially nonfunctional, hundreds of them have already been revealed to perform essential gene regulatory functions or to be linked to a number of cellular processes, including those associated with the etiology of human diseases. To better understand the biology of lncRNAs, it is essential to perform a more in-depth study of their evolution. In contrast to protein-encoding transcripts, however, they do not show the strong sequence conservation that usually results from purifying selection; therefore, software that is typically used to resolve the evolutionary relationships of protein-encoding genes and transcripts is not applicable to the study of lncRNAs. RESULTS: To tackle this issue, we developed lncEvo, a computational pipeline that consists of three modules: (1) transcriptome assembly from RNA-Seq data, (2) prediction of lncRNAs, and (3) conservation study-a genome-wide comparison of lncRNA transcriptomes between two species of interest, including search for orthologs. Importantly, one can choose to apply lncEvo solely for transcriptome assembly or lncRNA prediction, without calling the conservation-related part. CONCLUSIONS: lncEvo is an all-in-one tool built with the Nextflow framework, utilizing state-of-the-art software and algorithms with customizable trade-offs between speed and sensitivity, ease of use and built-in reporting functionalities. The source code of the pipeline is freely available for academic and nonacademic use under the MIT license at https://gitlab.com/spirit678/lncrna_conservation_nf .


Asunto(s)
Algoritmos , Biología Computacional , ARN Largo no Codificante , Programas Informáticos , Biología Computacional/métodos , Secuencia Conservada , Genoma , Humanos , ARN Largo no Codificante/genética , Transcriptoma
10.
Medicine (Baltimore) ; 100(6): e24471, 2021 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-33578541

RESUMEN

BACKGROUND: In osteosarcoma, the lung is the most common metastatic organ. Intensive work has been made to illuminate the pathogeny, but the specific metastatic mechanism remains unclear. Thus, we conducted the study to seek to find the key genes and critical functional pathways associated with progression and treatment in lung metastasis originating from osteosarcoma. METHODS: Two independent datasets (GSE14359 and GSE85537) were screened out from the Gene Expression Omnibus (GEO) database and the overlapping differentially expressed genes (DEGs) were identified using GEO2R online platform. Subsequently, the Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis of DEGs were conducted using DAVID. Meanwhile, the protein-protein interaction (PPI) network constructed by STRING was visualized using Cytoscape. Afterwards, the key module and hub genes were extracted from the PPI network using the MCODE and cytoHubba plugin. Moreover, the raw data obtained from GSE73166 and GSE21257 were applied to verify the expression differences and conduct the survival analyses of hub genes, respectively. Finally, the interaction network of miRNAs and hub genes constructed by ENCORI was visualized using Cytoscape. RESULTS: A total of 364 DEGs were identified, comprising 96 downregulated genes and 268 upregulated genes, which were mainly involved in cancer-associated pathways, adherens junction, ECM-receptor interaction, focal adhesion, MAPK signaling pathway. Subsequently, 10 hub genes were obtained and survival analysis demonstrated SKP2 and ASPM were closely related to poor prognosis of patients with osteosarcoma. Finally, hsa-miR-340-5p, has-miR-495-3p, and hsa-miR-96-5p were found to be most closely associated with these hub genes according to the interaction network of miRNAs and hub genes. CONCLUSION: The key genes and functional pathways identified in the study may contribute to understanding the molecular mechanisms involved in the carcinogenesis and progression of lung metastasis originating from osteosarcoma, and provide potential diagnostic and therapeutic targets.


Asunto(s)
Neoplasias Óseas/patología , Neoplasias Pulmonares/secundario , Osteosarcoma/patología , Neoplasias Óseas/diagnóstico , Neoplasias Óseas/genética , Biología Computacional , Regulación Neoplásica de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Genes Relacionados con las Neoplasias/genética , Marcadores Genéticos , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , MicroARNs/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Osteosarcoma/diagnóstico , Osteosarcoma/genética , Transducción de Señal/genética
11.
12.
PLoS Comput Biol ; 17(2): e1008686, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33544720

RESUMEN

The novelty of new human coronavirus COVID-19/SARS-CoV-2 and the lack of effective drugs and vaccines gave rise to a wide variety of strategies employed to fight this worldwide pandemic. Many of these strategies rely on the repositioning of existing drugs that could shorten the time and reduce the cost compared to de novo drug discovery. In this study, we presented a new network-based algorithm for drug repositioning, called SAveRUNNER (Searching off-lAbel dRUg aNd NEtwoRk), which predicts drug-disease associations by quantifying the interplay between the drug targets and the disease-specific proteins in the human interactome via a novel network-based similarity measure that prioritizes associations between drugs and diseases locating in the same network neighborhoods. Specifically, we applied SAveRUNNER on a panel of 14 selected diseases with a consolidated knowledge about their disease-causing genes and that have been found to be related to COVID-19 for genetic similarity (i.e., SARS), comorbidity (e.g., cardiovascular diseases), or for their association to drugs tentatively repurposed to treat COVID-19 (e.g., malaria, HIV, rheumatoid arthritis). Focusing specifically on SARS subnetwork, we identified 282 repurposable drugs, including some the most rumored off-label drugs for COVID-19 treatments (e.g., chloroquine, hydroxychloroquine, tocilizumab, heparin), as well as a new combination therapy of 5 drugs (hydroxychloroquine, chloroquine, lopinavir, ritonavir, remdesivir), actually used in clinical practice. Furthermore, to maximize the efficiency of putative downstream validation experiments, we prioritized 24 potential anti-SARS-CoV repurposable drugs based on their network-based similarity values. These top-ranked drugs include ACE-inhibitors, monoclonal antibodies (e.g., anti-IFNγ, anti-TNFα, anti-IL12, anti-IL1ß, anti-IL6), and thrombin inhibitors. Finally, our findings were in-silico validated by performing a gene set enrichment analysis, which confirmed that most of the network-predicted repurposable drugs may have a potential treatment effect against human coronavirus infections.


Asunto(s)
Algoritmos , Antivirales/farmacología , Reposicionamiento de Medicamentos/métodos , Pandemias , /epidemiología , Ensayos Clínicos como Asunto , Comorbilidad , Biología Computacional , Simulación por Computador , Descubrimiento de Drogas , Evaluación Preclínica de Medicamentos/métodos , Evaluación Preclínica de Medicamentos/estadística & datos numéricos , Reposicionamiento de Medicamentos/estadística & datos numéricos , Interacciones Microbiota-Huesped/efectos de los fármacos , Interacciones Microbiota-Huesped/fisiología , Humanos , Mapas de Interacción de Proteínas/efectos de los fármacos , /efectos de los fármacos
13.
Sci Rep ; 11(1): 3238, 2021 02 05.
Artículo en Inglés | MEDLINE | ID: mdl-33547334

RESUMEN

The rampant spread of COVID-19, an infectious disease caused by SARS-CoV-2, all over the world has led to over millions of deaths, and devastated the social, financial and political entities around the world. Without an existing effective medical therapy, vaccines are urgently needed to avoid the spread of this disease. In this study, we propose an in silico deep learning approach for prediction and design of a multi-epitope vaccine (DeepVacPred). By combining the in silico immunoinformatics and deep neural network strategies, the DeepVacPred computational framework directly predicts 26 potential vaccine subunits from the available SARS-CoV-2 spike protein sequence. We further use in silico methods to investigate the linear B-cell epitopes, Cytotoxic T Lymphocytes (CTL) epitopes, Helper T Lymphocytes (HTL) epitopes in the 26 subunit candidates and identify the best 11 of them to construct a multi-epitope vaccine for SARS-CoV-2 virus. The human population coverage, antigenicity, allergenicity, toxicity, physicochemical properties and secondary structure of the designed vaccine are evaluated via state-of-the-art bioinformatic approaches, showing good quality of the designed vaccine. The 3D structure of the designed vaccine is predicted, refined and validated by in silico tools. Finally, we optimize and insert the codon sequence into a plasmid to ensure the cloning and expression efficiency. In conclusion, this proposed artificial intelligence (AI) based vaccine discovery framework accelerates the vaccine design process and constructs a 694aa multi-epitope vaccine containing 16 B-cell epitopes, 82 CTL epitopes and 89 HTL epitopes, which is promising to fight the SARS-CoV-2 viral infection and can be further evaluated in clinical studies. Moreover, we trace the RNA mutations of the SARS-CoV-2 and ensure that the designed vaccine can tackle the recent RNA mutations of the virus.


Asunto(s)
Aprendizaje Profundo , Glicoproteína de la Espiga del Coronavirus/inmunología , Alérgenos , /efectos adversos , /inmunología , Uso de Codones , Biología Computacional , Diseño de Fármacos , Epítopos de Linfocito B/inmunología , Epítopos de Linfocito T/inmunología , Humanos , Inmunogenicidad Vacunal , Modelos Moleculares , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Mutación , Conformación Proteica , ARN Viral , /genética , Solubilidad , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Linfocitos T Citotóxicos/inmunología , Linfocitos T Colaboradores-Inductores/inmunología , Vacunas de Subunidad/química , Vacunas de Subunidad/inmunología
14.
Medicine (Baltimore) ; 100(4): e24435, 2021 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-33530245

RESUMEN

ABSTRACT: Obstructive sleep apnea (OSA) is a common chronic disease and increases the risk of cardiovascular disease, metabolic and neuropsychiatric disorders, resulting in a considerable socioeconomic burden. This study aimed to identify potential key genes influence the mechanisms and consequences of OSA.Gene expression profiles related to OSA were obtained from Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) in subcutaneous adipose tissues from OSA compared with normal tissues were screened using R software, followed by gene ontology (GO) and pathway enrichment analyses. Subsequently, a protein-protein interaction (PPI) network for these DEGs was constructed by STRING, and key hub genes were extracted from the network with plugins in Cytoscape. The hub genes were further validated in another GEO dataset and assessed by receiver operating characteristic (ROC) analysis and Pearson correlation analysis.There were 373 DEGs in OSA samples in relative to normal controls, which were mainly associated with olfactory receptor activity and olfactory transduction. Upon analyses of the PPI network, GDNF, SLC2A2, PRL, and SST were identified as key hub genes. Decreased expression of the hub genes was association with OSA occurrence, and exhibited good performance in distinguishing OSA from normal samples based on ROC analysis. Besides, the Pearson method revealed a strong correlation between hub genes, which indicates that they may act in synergy, contributing to OSA and related disorders.This bioinformatics research identified 4 hub genes, including GDNF, SLC2A2, PRL, and SST which may be new potential biomarkers for OSA and related disorders.


Asunto(s)
Tamizaje Masivo/métodos , Análisis por Micromatrices/métodos , Apnea Obstructiva del Sueño/genética , Biomarcadores/análisis , Biología Computacional , Bases de Datos Genéticas , Factor Neurotrófico Derivado de la Línea Celular Glial/genética , Transportador de Glucosa de Tipo 2/genética , Humanos , Prolactina/genética , Mapas de Interacción de Proteínas , Curva ROC , Somatostatina/genética , Transcriptoma
15.
BMC Bioinformatics ; 22(1): 52, 2021 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-33557749

RESUMEN

BACKGROUND: Drug repositioning refers to the identification of new indications for existing drugs. Drug-based inference methods for drug repositioning apply some unique features of drugs for new indication prediction. Complementary information is provided by these different features. It is therefore necessary to integrate these features for more accurate in silico drug repositioning. RESULTS: In this study, we collect 3 different types of drug features (i.e., chemical, genomic and pharmacological spaces) from public databases. Similarities between drugs are separately calculated based on each of the features. We further develop a fusion method to combine the 3 similarity measurements. We test the inference abilities of the 4 similarity datasets in drug repositioning under the guilt-by-association principle. Leave-one-out cross-validations show the integrated similarity measurement IntegratedSim receives the best prediction performance, with the highest AUC value of 0.8451 and the highest AUPR value of 0.2201. Case studies demonstrate IntegratedSim produces the largest numbers of confirmed predictions in most cases. Moreover, we compare our integration method with 3 other similarity-fusion methods using the datasets in our study. Cross-validation results suggest our method improves the prediction accuracy in terms of AUC and AUPR values. CONCLUSIONS: Our study suggests that the 3 drug features used in our manuscript are valuable information for drug repositioning. The comparative results indicate that integration of the 3 drug features would improve drug-disease association prediction. Our study provides a strategy for the fusion of different drug features for in silico drug repositioning.


Asunto(s)
Reposicionamiento de Medicamentos , Genómica , Algoritmos , Biología Computacional , Simulación por Computador , Bases de Datos Factuales
16.
Sci Rep ; 11(1): 3343, 2021 02 08.
Artículo en Inglés | MEDLINE | ID: mdl-33558602

RESUMEN

The new coronavirus disease (COVID-19) is a challenge for clinical decision-making and the effective allocation of healthcare resources. An accurate prognostic assessment is necessary to improve survival of patients, especially in developing countries. This study proposes to predict the risk of developing critical conditions in COVID-19 patients by training multipurpose algorithms. We followed a total of 1040 patients with a positive RT-PCR diagnosis for COVID-19 from a large hospital from São Paulo, Brazil, from March to June 2020, of which 288 (28%) presented a severe prognosis, i.e. Intensive Care Unit (ICU) admission, use of mechanical ventilation or death. We used routinely-collected laboratory, clinical and demographic data to train five machine learning algorithms (artificial neural networks, extra trees, random forests, catboost, and extreme gradient boosting). We used a random sample of 70% of patients to train the algorithms and 30% were left for performance assessment, simulating new unseen data. In order to assess if the algorithms could capture general severe prognostic patterns, each model was trained by combining two out of three outcomes to predict the other. All algorithms presented very high predictive performance (average AUROC of 0.92, sensitivity of 0.92, and specificity of 0.82). The three most important variables for the multipurpose algorithms were ratio of lymphocyte per C-reactive protein, C-reactive protein and Braden Scale. The results highlight the possibility that machine learning algorithms are able to predict unspecific negative COVID-19 outcomes from routinely-collected data.


Asunto(s)
/diagnóstico , Biología Computacional/métodos , Aprendizaje Automático , /genética , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Brasil/epidemiología , Proteína C-Reactiva/análisis , /virología , Estudios de Cohortes , Femenino , Humanos , Unidades de Cuidados Intensivos , Tiempo de Internación , Recuento de Linfocitos , Masculino , Persona de Mediana Edad , Pronóstico , Respiración Artificial , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
17.
Medicine (Baltimore) ; 100(6): e24674, 2021 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-33578599

RESUMEN

BACKGROUND: Gastric cancer has multiple metastasis pathways, of which lymph node metastasis plays a dominant role. However, the specific mechanism of lymph node metastasis is still not unclear. METHODS: The bioinformatics technology was utilized to mine gene chip data related to gastric cancer and Epithelial-Mesenchymal Transition (EMT) in a high-throughput gene expression database (Gene Expression Omnibus, GEO), we screened out all genes that have differential expression levels in gastric cancer tissues and in adjacent normal gastric mucosa tissues. The corresponding function package of R language software were performed for gene annotation and cluster analysis, then enrichment analysis of genes with differential expression and protein interaction network diagram for correlation analysis were performed, we finally screened out the paired related homeobox 1 gene (PRRX1) related to EMT. Next, we collected 65 metastatic lymph node samples and 93 gastric cancer tissue samples. The expression levels of PRRX1 and EMT-related protein E-cadherin (E-ca) and vimentin (Vim) in gastric cancer tissues and metastatic lymph node tissues were determined by immunohistochemistry (IHC) staining of streptavidin-peroxidase (SP). The expression differences of PRRX1, E-ca and Vim in gastric cancer tissues and metastatic lymph node tissues as well as the correlation were analyzed by the experimental data, and the clinical significance was analyzed in combination with the clinicopathological data. RESULTS: The PRRX1 expression levels in gastric cancer tissues are significantly higher than that in adjacent normal gastric mucosa tissues. The positive expression rates of PRRX1, Vim and E-ca in gastric cancer and in metastatic lymph node tissues were significantly different. Comparing with that in gastric cancer, expression of PRRX1 and Vim was significantly down-regulated, and E-ca expression was significantly up-regulated in metastatic lymph nodes. CONCLUSION: PRRX1 may promote lymph node metastasis of gastric cancer by regulating EMT, and then affect the prognosis of patients. PRRX1 may be used as a new biological indicator to predict or prevent lymph node metastasis in gastric cancer.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Transición Epitelial-Mesenquimal/genética , Proteínas de Homeodominio/genética , Metástasis Linfática/genética , Neoplasias Gástricas/secundario , Adulto , Anciano , Cadherinas/metabolismo , Análisis por Conglomerados , Biología Computacional/métodos , Manejo de Datos , Regulación hacia Abajo , Regulación Neoplásica de la Expresión Génica , Proteínas de Homeodominio/metabolismo , Humanos , Metástasis Linfática/patología , Masculino , Persona de Mediana Edad , Peroxidasa/metabolismo , Pronóstico , Neoplasias Gástricas/patología , Estreptavidina/metabolismo , Regulación hacia Arriba , Vimentina/metabolismo
18.
Artif Intell Med ; 112: 102018, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33581830

RESUMEN

BACKGROUND AND OBJECTIVE: The novel coronavirus disease 2019 (COVID-19) is considered a pandemic by the World Health Organization (WHO). As of April 3, 2020, there were 1,009,625 reported confirmed cases, and 51,737 reported deaths. Doctors have been faced with a myriad of patients who present with many different symptoms. This raises two important questions. What are the common symptoms, and what are their relative importance? METHODS: A non-structured and incomplete COVID-19 dataset of 14,251 confirmed cases was preprocessed. This produced a complete and organized COVID-19 dataset of 738 confirmed cases. Six different feature selection algorithms were then applied to this new dataset. Five of these algorithms have been proposed earlier in the literature. The sixth is a novel algorithm being proposed by the authors, called Variance Based Feature Weighting (VBFW), which not only ranks the symptoms (based on their importance) but also assigns a quantitative importance measure to each symptom. RESULTS: For our COVID-19 dataset, the five different feature selection algorithms provided different rankings for the most important top-five symptoms. They even selected different symptoms for inclusion within the top five. This is because each of the five algorithms ranks the symptoms based on different data characteristics. Each of these algorithms has advantages and disadvantages. However, when all these five rankings were aggregated (using two different aggregating methods) they produced two identical rankings of the five most important COVID-19 symptoms. Starting from the most important to least important, they were: Fever/Cough, Fatigue, Sore Throat, and Shortness of Breath. (Fever and cough were ranked equally in both aggregations.) Meanwhile, the sixth novel Variance Based Feature Weighting algorithm, chose the same top five symptoms, but ranked fever much higher than cough, based on its quantitative importance measures for each of those symptoms (Fever - 75 %, Cough - 39.8 %, Fatigue - 16.5 %, Sore Throat - 10.8 %, and Shortness of Breath - 6.6 %). Moreover, the proposed VBFW method achieved an accuracy of 92.1 % when used to build a one-class SVM model, and an NDCG@5 of 100 %. CONCLUSIONS: Based on the dataset, and the feature selection algorithms employed here, symptoms of Fever, Cough, Fatigue, Sore Throat and Shortness of Breath are important symptoms of COVID-19. The VBFW algorithm also indicates that Fever and Cough symptoms were especially indicative of COVID-19, for the confirmed cases that are documented in our database.


Asunto(s)
/fisiopatología , Biología Computacional/métodos , Algoritmos , /virología , Tos/fisiopatología , Disnea/fisiopatología , Fatiga/fisiopatología , Fiebre/fisiopatología , Humanos , Pandemias , Faringitis/fisiopatología , /aislamiento & purificación
19.
BMC Bioinformatics ; 22(1): 44, 2021 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-33535967

RESUMEN

BACKGROUND: Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes. RESULTS: In this study, we propose a novel differential expression and feature selection method-GEOlimma-which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset. CONCLUSIONS: Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets.


Asunto(s)
Biología Computacional , Perfilación de la Expresión Génica , Teorema de Bayes , Niño , Femenino , Humanos , Masculino , Análisis de Secuencia por Matrices de Oligonucleótidos , Tamaño de la Muestra
20.
BMC Bioinformatics ; 22(1): 45, 2021 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-33541262

RESUMEN

BACKGROUND: Several bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into cancer. Recent genomics and metagenomics analyses have shown that viruses also integrate into the genome of non-model organisms (i.e., arthropods, fish, plants, vertebrates). However, rarely studies of endogenous viral elements (EVEs) in non-model organisms have gone beyond their characterization from reference genome assemblies. In non-model organisms, we lack a thorough understanding of the widespread occurrence of EVEs and their biological relevance, apart from sporadic cases which nevertheless point to significant roles of EVEs in immunity and regulation of expression. The concomitance of repetitive DNA, duplications and/or assembly fragmentations in a genome sequence and intrasample variability in whole-genome sequencing (WGS) data could determine misalignments when mapping data to a genome assembly. This phenomenon hinders our ability to properly identify integration sites. RESULTS: To fill this gap, we developed ViR, a pipeline which solves the dispersion of reads due to intrasample variability in sequencing data from both single and pooled DNA samples thus ameliorating the detection of integration sites. We tested ViR to work with both in silico and real sequencing data from a non-model organism, the arboviral vector Aedes albopictus. Potential viral integrations predicted by ViR were molecularly validated supporting the accuracy of ViR results. CONCLUSION: ViR will open new venues to explore the biology of EVEs, especially in non-model organisms. Importantly, while we generated ViR with the identification of EVEs in mind, its application can be extended to detect any lateral transfer event providing an ad-hoc sequence to interrogate.


Asunto(s)
Mosquitos Vectores , Integración Viral , Secuenciación Completa del Genoma , Animales , Biología Computacional , Genoma Viral , Genómica , Humanos , Integración Viral/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA