Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34.975
Filtrar
1.
Arch Virol ; 165(1): 127-135, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31741097

RESUMEN

In clinical virome research, whole-genome/transcriptome amplification is required when starting material is limited. An improved method, named "template-dependent multiple displacement amplification" (tdMDA), has recently been developed in our lab (Wang et al. in BioTechniques 63:21-25. https://doi.org/10.2144/000114566, 2017). In combination with Illumina sequencing and bioinformatics pipelines, its application in virome sequencing was explored using a serum sample from a patient with chronic hepatitis C virus (HCV) infection. In comparison to an amplification-free procedure, virome sequencing via tdMDA showed a 9.47-fold enrichment for HCV-mapped reads and, accordingly, an increase in HCV genome coverage from 28.5% to 70.1%. Eight serum samples from acute patients liver failure (ALF) with or without known etiology were then used for virome sequencing with an average depth at 94,913x. Both similarity-based (mapping, NCBI BLASTn, BLASTp, and profile hidden Markov model analysis) and similarity-independent methods (machine-learning algorithms) identified viruses from multiple families, including Herpesviridae, Picornaviridae, Myoviridae, and Anelloviridae. However, their commensal nature and cross-detection ruled out an etiological interpretation. Together with a lack of detection of novel viruses in a comprehensive analysis at a resolution of single reads, these data indicate that viral agents might be rare in ALF cases with indeterminate etiology.


Asunto(s)
Biología Computacional/métodos , Hepatitis C Crónica/diagnóstico , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Fallo Hepático Agudo/virología , Suero/virología , Anelloviridae/aislamiento & purificación , Anelloviridae/fisiología , Perfilación de la Expresión Génica/métodos , Hepacivirus/genética , Hepacivirus/aislamiento & purificación , Hepatitis C Crónica/sangre , Herpesviridae/aislamiento & purificación , Herpesviridae/fisiología , Humanos , Fallo Hepático Agudo/sangre , Myoviridae/aislamiento & purificación , Myoviridae/fisiología , Picornaviridae/aislamiento & purificación , Picornaviridae/fisiología , Especificidad de la Especie , Simbiosis , Secuenciación Completa del Genoma/métodos
2.
Mol Genet Genomics ; 295(1): 177-193, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31620884

RESUMEN

Genetic variation is expressed by the presence of polymorphisms in compared genomes of individuals that can be transferred to next generations. The aim of this work was to reveal genome dynamics by predicting polymorphisms among the genomes of three individuals of the highly inbred B10 cucumber (Cucumis sativus L.) line. In this study, bioinformatic comparative genomics was used to uncover cucumber genome dynamics (also called real-time evolution). We obtained a new genome draft assembly from long single molecule real-time (SMRT) sequencing reads and used short paired-end read data from three individuals to analyse the polymorphisms. Using this approach, we uncovered differentiation aspects in the genomes of the inbred B10 line. The newly assembled genome sequence (B10v3) has the highest contiguity and quality characteristics among the currently available cucumber genome draft sequences. Standard and newly designed approaches were used to predict single nucleotide and structural variants that were unique among the three individual genomes. Some of the variant predictions spanned protein-coding genes and their promoters, and some were in the neighbourhood of annotated interspersed repetitive elements, indicating that the highly inbred homozygous plants remained genetically dynamic. This is the first bioinformatic comparative genomics study of a single highly inbred plant line. For this project, we developed a polymorphism prediction method with optimized precision parameters, which allowed the effective detection of small nucleotide variants (SNVs). This methodology could significantly improve bioinformatic pipelines for comparative genomics and thus has great practical potential in genomic metadata handling.


Asunto(s)
Cucumis sativus/genética , Genoma de Planta/genética , Mapeo Cromosómico/métodos , Biología Computacional/métodos , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Polimorfismo Genético/genética , Regiones Promotoras Genéticas/genética
3.
Hum Genet ; 139(1): 61-71, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30915546

RESUMEN

Statistical methods for genome-wide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDEL project (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Estudio de Asociación del Genoma Completo , Modelos Estadísticos , Lenguajes de Programación , Algoritmos , Humanos , Polimorfismo de Nucleótido Simple , Programas Informáticos
4.
Gene ; 726: 144176, 2020 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-31669641

RESUMEN

Gastric cancer is a serious problem for human health. As part of noncoding RNA, circular RNA (circRNA) plays a key role in the occurrence and development of malignant tumor. We used next generation sequencing technology to detect circRNA expression profiles in 5 paired human gastric cancer tissues. Then, bioinformatics analysis was carried out to analyze the function of dysregulated circRNAs. Hsa_circ_0058092 was selected as the object of follow-up analysis. After using the Cistrome DB dataset the data was used to predict specific transcription factors of hsa_circ_0058092. The relationship between hsa_circ_0058092 and PODXL was further validated using RT-PCR and immunohistochemical techniques. Survival data were collected using a Kaplan-Meier analysis of hsa_circ_0058092. We identified 319 aberrantly expressed circRNAs, Hsa_circ_0058092 was selected for our studies. Functional analysis of hsa_circ_0058092 revealed that it was related to metabolic processes. The prediction results suggested that hsa_circ_0058092 has a relationship with hsa-miR-4269 which could specifically bind to the PODXL sequence. Transcription factor CEBPB may regulate the transcription process of hsa_circ_0058092. The expression of hsa_circ_0058092 was positively correlated with PODXL expression. Immunohistochemical analysis of PODXL showed that the expression of PODXL protein in cancer tissues is higher than that in adjacent tissues. Kaplan-Meier analysis suggested that hsa_circ_0058092 was associated with survival of gastric cancer patients. All of these results showed that hsa_circ_0058092 was a potential oncogene.


Asunto(s)
Oncogenes/genética , Neoplasias Gástricas/genética , Biomarcadores de Tumor/genética , Biología Computacional/métodos , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Masculino , MicroARNs/genética , Persona de Mediana Edad , ARN no Traducido/genética , Sialoglicoproteínas/genética , Factores de Transcripción/genética , Transcripción Genética/genética
5.
Mol Genet Genomics ; 295(1): 13-21, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31392406

RESUMEN

As one of the most popular post-transcriptional modifications, pseudouridine (Ψ) participates in a series of biological processes. Therefore, the efficient detection of pseudouridine sites is very important in revealing its functions in biological processes. Although experimental techniques have been proposed for identifying Ψ sites at single-base resolution, they are still labor intensive and expensive. Recently, to fill the experimental method's gap, computational methods have been proposed for identifying Ψ sites. However, their performances are still unsatisfactory. In this paper, we proposed an eXtreme Gradient Boosting (xgboost)-based method, called XG-PseU, to identify Ψ sites based on the optimal features obtained using the forward feature selection together with increment feature selection method. Our results demonstrated that XG-PseU is superior or at least complementary to existing methods for identifying pseudouridine sites. Finally, a freely available online web server for XG-PseU was established at http://www.bioml.cn/. We wish that XG-PseU will become a useful tool for computationally identifying Ψ sites.


Asunto(s)
Biología Computacional/métodos , Seudouridina/genética , Animales , Humanos , Ratones , Procesamiento Postranscripcional del ARN/genética , Saccharomyces cerevisiae/genética
6.
Food Chem ; 309: 125760, 2020 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-31787392

RESUMEN

Molecular characterization of exogenous DNA integrations in host genome is a key aspect in risk assessment of bioengineered crops. However, gaining a clear understanding of the molecular characters of a bioengineered crop using conventional techniques remains a challenging task. Herein, we report the full molecular characterization of one new transgenic rice event G6H1 via a paired-end sequencing approach and bioinformatics analysis pipelines. Also, the molecular characterization reported was validated using conventional PCR, Sanger sequencing, and digital PCR. The results showed there is only one copy of the exogenous DNA inserted, which is located within chromosome 7 of the G6H1 genome. There is no other unintended integration of sequences from the transformation plasmid. These results indicated that the paired-end sequencing approach, combined with bioinformatics pipeline developed, is well suited to elucidate the molecular characteristics of bioengineered crops, and is efficient, low cost, and comprehensive.


Asunto(s)
Oryza/genética , Plantas Modificadas Genéticamente/genética , Mapeo Cromosómico , Biología Computacional/métodos , Productos Agrícolas/genética , ADN/análisis , ADN/metabolismo , Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADN
8.
Cancer Immunol Immunother ; 69(2): 175-187, 2020 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-31853576

RESUMEN

High grade ovarian serous cancer (HGSC) is a malignant disease with high mortality. Glycosylation plays important roles in tumor invasion and immune evasion, but its effect on the immune microenvironment of HGSC remains unclear. This study examined the association of glycosyltransferase expression with HGSC prognosis and explored the underlying mechanism using clinical specimens and integrated bioinformatic analyses. We identified a cluster of 15 glycogenes associated with reduced overall survival, and GALNT10 was found to be an independent predictor of HGSC prognosis. The high GALNT10 expression was associated with increased regulatory CD4+ T cells infiltration and decreased granzyme B expression in CD8+ T cells. The expression of GALNT10 and its product, Tn antigen, in HGSC specimens was associated with the increased infiltration of M2 macrophages and neutrophils, and the decreased infiltration of CD3+ T cells, NK cells, and B cells. Taken collectively, high GALNT10 expression confers with immunosuppressive microenvironment to promote tumor progression and predicts poor clinical outcomes in HGSC patients.


Asunto(s)
Cistadenocarcinoma Seroso/genética , Cistadenocarcinoma Seroso/mortalidad , Expresión Génica , N-Acetilgalactosaminiltransferasas/genética , Neoplasias Ováricas/genética , Neoplasias Ováricas/mortalidad , Microambiente Tumoral/genética , Biomarcadores de Tumor , Biología Computacional/métodos , Cistadenocarcinoma Seroso/patología , Bases de Datos Genéticas , Femenino , Humanos , Inmunohistoquímica , Inmunomodulación/genética , Clasificación del Tumor , Neoplasias Ováricas/patología , Pronóstico , Estudios Retrospectivos
9.
Gene ; 723: 144134, 2020 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-31589960

RESUMEN

Viral kinases are known to undergo autophosphorylation and also phosphorylate viral and host substrates. Viral kinases have been implicated in various diseases and are also known to acquire host kinases for mimicking cellular functions and exhibit virulence. Although substantial analyses have been reported in the literature on diversity of viral kinases, there is a gap in the understanding of sequence and structural similarity among kinases from different classes of viruses. In this study, we performed a comprehensive analysis of protein kinases encoded in viral genomes. Homology search methods have been used to identify kinases from 104,282 viral genomic datasets. Serine/threonine and tyrosine kinases are identified only in 390 viral genomes. Out of seven viral classes that are based on nature of genetic material, only viruses having double-stranded DNA and single-stranded RNA retroviruses are found to encode kinases. The 716 identified protein kinases are classified into 63 subfamilies based on their sequence similarity within each cluster, and sequence signatures have been identified for each subfamily. 11 clusters are well represented with at least 10 members in each of these clusters. Kinases from dsDNA viruses, Phycodnaviridae which infect green algae and Herpesvirales that infect vertebrates including human, form a major group. From our analysis, it has been observed that the protein kinases in viruses belonging to same taxonomic lineages form discrete clusters and the kinases encoded in alphaherpesvirus form host-specific clusters. A comprehensive sequence and structure-based analysis enabled us to identify the conserved residues or motifs in kinase catalytic domain regions across all viral kinases. Conserved sequence regions that are specific to a particular viral kinase cluster and the kinases that show close similarity to eukaryotic kinases were identified by using sequence and three-dimensional structural regions of eukaryotic kinases as reference. The regions specific to each viral kinase cluster can be used as signatures in the future in classifying uncharacterized viral kinases. We note that kinases from giant viruses Marseilleviridae have close similarity to viral oncogenes in the functional regions and in putative substrate binding regions indicating their possible role in cancer.


Asunto(s)
Proteínas Quinasas/química , Proteínas Quinasas/genética , Virus/clasificación , Dominio Catalítico , Biología Computacional/métodos , Bases de Datos de Proteínas , Variación Genética , Fosforilación , Filogenia , Proteínas Quinasas/metabolismo , Homología de Secuencia de Aminoácido , Proteínas Virales/química , Proteínas Virales/genética , Proteínas Virales/metabolismo , Factores de Virulencia/química , Factores de Virulencia/genética , Factores de Virulencia/metabolismo , Virus/enzimología , Virus/patogenicidad
10.
Gene ; 724: 144150, 2020 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-31589961

RESUMEN

Ovarian cancer (OC) is the deadliest form of gynecologic malignancy, with the majority of patients being diagnosed only once the disease reaches an advanced stage owing to a lack of available biomarkers capable of accurately detecting the disease. Stable circular RNAs (circRNAs) can be found at high levels in exosomes, and there is evidence to suggest that they may be viable diagnostic biomarkers for certain cancers. However, circRNAs in the serum of OC patients have rarely been evaluated to date. We therefore sought to investigate serum circRNA profiles of OC patients, and to explore whether these sorts of circRNAs could be used to detect early OC, serving as biomarkers of disease that may allow for the earlier treatment thereof. Second-generation sequencing was used to screen differentially expressed circRNAs in OC patient serum and also in the serum obtained from healthy controls, and circRNA expression was confirmed by qPCR. A bioinformatics-based approach was then used to assess what biological functions might be affected be the altered regulation of these RNA molecules. We further conducted GO, KEGG, and network analyses to further explore the expression of circRNAs. We detected 178 differentially expressed circRNAs in OC patient serum, of which 175 were up-regulated and 3 were down-regulated. We validated 5 of these identified circRNAs by qPCR to confirm their expression, and further found these RNAs to be closely linked with FC gamma R-mediated phagocytosis, VEGF signaling, Transcriptional misregulation in cancer, Chemokine signaling, ErbB signaling, and TNF signaling based on conducted analyses. This study provides a profile of circRNAs in OC patient serum, revealing a pattern of dysregulation of these RNAs associated with OC. Our bioinformatics analysis suggested that these circRNAs are likely related to OC development, and as such they may be viable novel OC biomarkers.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Neoplasias Ováricas/genética , ARN/sangre , Sitios de Unión , Biomarcadores de Tumor/sangre , Biomarcadores de Tumor/genética , Estudios de Casos y Controles , Biología Computacional/métodos , Receptores ErbB/genética , Receptores ErbB/metabolismo , Femenino , Perfilación de la Expresión Génica , Ontología de Genes , Redes Reguladoras de Genes , Humanos , ARN/genética , ARN Mensajero/metabolismo , Reacción en Cadena en Tiempo Real de la Polimerasa , Reproducibilidad de los Resultados , Regulación hacia Arriba , Factor A de Crecimiento Endotelial Vascular/genética , Factor A de Crecimiento Endotelial Vascular/metabolismo
12.
BMC Bioinformatics ; 20(Suppl 26): 628, 2019 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-31839008

RESUMEN

BACKGROUND: Development of new drugs is a time-consuming and costly process, and the cost is still increasing in recent years. However, the number of drugs approved by FDA every year per dollar spent on development is declining. Drug repositioning, which aims to find new use of existing drugs, attracts attention of pharmaceutical researchers due to its high efficiency. A variety of computational methods for drug repositioning have been proposed based on machine learning approaches, network-based approaches, matrix decomposition approaches, etc. RESULTS: We propose a novel computational method for drug repositioning. We construct and decompose three-dimensional tensors, which consist of the associations among drugs, targets and diseases, to derive latent factors reflecting the functional patterns of the three kinds of entities. The proposed method outperforms several baseline methods in recovering missing associations. Most of the top predictions are validated by literature search and computational docking. Latent factors are used to cluster the drugs, targets and diseases into functional groups. Topological Data Analysis (TDA) is applied to investigate the properties of the clusters. We find that the latent factors are able to capture the functional patterns and underlying molecular mechanisms of drugs, targets and diseases. In addition, we focus on repurposing drugs for cancer and discover not only new therapeutic use but also adverse effects of the drugs. In the in-depth study of associations among the clusters of drugs, targets and cancer subtypes, we find there exist strong associations between particular clusters. CONCLUSIONS: The proposed method is able to recover missing associations, discover new predictions and uncover functional clusters of drugs, targets and diseases. The clustering of drugs, targets and diseases, as well as the associations among the clusters, provides a new guiding framework for drug repositioning.


Asunto(s)
Biología Computacional , Reposicionamiento de Medicamentos , Análisis por Conglomerados , Biología Computacional/métodos , Reposicionamiento de Medicamentos/métodos , Humanos , Aprendizaje Automático
13.
Medicine (Baltimore) ; 98(52): e18493, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-31876736

RESUMEN

Bronchopulmonary dysplasia (BPD) is a common disease of premature infants with very low birth weight. The mechanism is inconclusive. The aim of this study is to systematically explore BPD-related genes and characterize their functions.Natural language processing analysis was used to identify BPD-related genes. Gene data were extracted from PubMed database. Gene ontology, pathway, and network analysis were carried out, and the result was integrated with corresponding database.In this study, 216 genes were identified as BPD-related genes with P < .05, and 30 pathways were identified as significant. A network of BPD-related genes was also constructed with 17 hub genes identified. In particular, phosphatidyl inositol-3-enzyme-serine/threonine kinase signaling pathway involved the largest number of genes. Insulin was found to be a promising candidate gene related with BPD, suggesting that it may serve as an effective therapeutic target.Our data may help to better understand the molecular mechanisms underlying BPD. However, the mechanisms of BPD are elusive, and further studies are needed.


Asunto(s)
Displasia Broncopulmonar/genética , Minería de Datos , Algoritmos , Displasia Broncopulmonar/etiología , Displasia Broncopulmonar/metabolismo , Biología Computacional/métodos , Minería de Datos/métodos , Ontología de Genes , Genes/genética , Genes/fisiología , Predisposición Genética a la Enfermedad/genética , Humanos , Recién Nacido , Redes y Vías Metabólicas/genética , Procesamiento de Lenguaje Natural , Transducción de Señal/genética
14.
Int. microbiol ; 22(4): 437-449, dic. 2019. graf, tab
Artículo en Inglés | IBECS | ID: ibc-185062

RESUMEN

Azurin, a bacteriocin produced by a human gut bacterium Pseudomonas aeruginosa, can reveal selectively cytotoxic and induce apoptosis in cancer cells. After overcoming two phase I trials, a functional region of Azurin called p28 has been approved as a drug for the treatment of brain tumor glioma by FDA. The present study aims to improve a screening procedure and assess genetic diversity of Azurin genes in P. aeruginosa and Azurin-like genes in the gut microbiome of a specific population in Vietnam and global populations. Firstly, both cultivation-dependent and cultivation-independent techniques based on genomic and metagenomic DNAs extracted from fecal samples of the healthy specific population were performed and optimized to detect Azurin genes. Secondly, the Azurin gene sequences were analyzed and compared with global populations by using bioinformatics tools. Finally, the screening procedure improved from the first step was applied for screening Azurin-like genes, followed by the protein synthesis and NCI in vitro screening for anticancer activity. As a result, this study has successfully optimized the annealing temperatures to amplify DNAs for screening Azurin genes and applying to Azurin-like genes from human gut microbiota. The novelty of this study is the first of its kind to classify Azurin genes into five different genotypes at a global scale and confirm the potential anticancer activity of three Azurin-like synthetic proteins (Cnazu1, Dlazu11, and Ruazu12). The results contribute to the procedure development applied for screening anticancer proteins from human microbiome and a comprehensive understanding of their therapeutic response at a genetic level


No disponible


Asunto(s)
Azurina/genética , Técnicas In Vitro/métodos , Variación Genética/efectos de los fármacos , Microbioma Gastrointestinal/genética , Azurina/uso terapéutico , Bacteriocinas/genética , Microbioma Gastrointestinal/efectos de los fármacos , Metagenómica , Biología Computacional/métodos , Antineoplásicos/farmacología
15.
BMC Bioinformatics ; 20(1): 602, 2019 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-31752668

RESUMEN

BACKGROUND: S-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (-SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation. RESULTS: In this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods. CONCLUSIONS: In summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Proteoma/metabolismo , Sulfamerazina/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Área Bajo la Curva , Secuencia Conservada , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Curva ROC , Programas Informáticos
17.
BMC Bioinformatics ; 20(1): 543, 2019 Nov 04.
Artículo en Inglés | MEDLINE | ID: mdl-31684857

RESUMEN

BACKGROUND: Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a 'pathway space'. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity. RESULTS: Models in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases. CONCLUSIONS: Prediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Transducción de Señal , Bases de Datos Factuales , Expresión Génica , Humanos , Modelos Estadísticos , Fenotipo , Transcriptoma
18.
BMC Bioinformatics ; 20(1): 545, 2019 Nov 04.
Artículo en Inglés | MEDLINE | ID: mdl-31684860

RESUMEN

BACKGROUND: miRNAs regulate the expression of several genes with one miRNA able to target multiple genes and with one gene able to be simultaneously targeted by more than one miRNA. Therefore, it has become indispensable to shorten the long list of miRNA-target interactions to put in the spotlight in order to gain insight into understanding the regulatory mechanism orchestrated by miRNAs in various cellular processes. A reasonable solution is certainly to prioritize miRNA-target interactions to maximize the effectiveness of the downstream analysis. RESULTS: We propose a new and easy-to-use web tool MIENTURNET (MicroRNA ENrichment TURned NETwork) that receives in input a list of miRNAs or mRNAs and tackles the problem of prioritizing miRNA-target interactions by performing a statistical analysis followed by a fully featured network-based visualization and analysis. The statistics is used to assess the significance of an over-representation of miRNA-target interactions and then MIENTURNET filters based on the statistical significance associated with each miRNA-target interaction. In addition, the holistic approach of the network theory is used to infer possible evidences of miRNA regulation by capturing emergent properties of the miRNA-target regulatory network that would be not evident through a pairwise analysis of the individual components. CONCLUSION: MIENTURNET offers the possibility to consistently perform both statistical and network-based analyses by using only a single tool leading to a more effective prioritization of the miRNA-target interactions. This has the potential to avoid researchers without computational and informatics skills to navigate multiple websites and thus to independently investigate miRNA activity in every cellular process of interest in an easy and at the same time exhaustive way thanks to the intuitive web interface. The web application along with a well-documented and comprehensive user guide are freely available at http://userver.bio.uniroma1.it/apps/mienturnet/ without any login requirement.


Asunto(s)
Biología Computacional/métodos , MicroARNs/genética , Biología Computacional/instrumentación , Redes Reguladoras de Genes , Internet , ARN Mensajero/genética
19.
BMC Bioinformatics ; 20(1): 544, 2019 Nov 04.
Artículo en Inglés | MEDLINE | ID: mdl-31684876

RESUMEN

BACKGROUND: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed. RESULTS: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction. CONCLUSIONS: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin .


Asunto(s)
Biología Computacional/métodos , Virus ARN/genética , Algoritmos , Genoma Viral , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Infecciones por Virus ARN/virología , Virus ARN/clasificación , Virus ARN/aislamiento & purificación , Programas Informáticos
20.
BMC Bioinformatics ; 20(1): 546, 2019 Nov 04.
Artículo en Inglés | MEDLINE | ID: mdl-31684881

RESUMEN

BACKGROUND: Pathway enrichment extensively used in the analysis of Omics data for gaining biological insights into the functional roles of pre-defined subsets of genes, proteins and metabolites. A large number of methods have been proposed in the literature for this task. The vast majority of these methods use as input expression levels of the biomolecules under study together with their membership in pathways of interest. The latest generation of pathway enrichment methods also leverages information on the topology of the underlying pathways, which as evidence from their evaluation reveals, lead to improved sensitivity and specificity. Nevertheless, a systematic empirical comparison of such methods is still lacking, making selection of the most suitable method for a specific experimental setting challenging. This comparative study of nine network-based methods for pathway enrichment analysis aims to provide a systematic evaluation of their performance based on three real data sets with different number of features (genes/metabolites) and number of samples. RESULTS: The findings highlight both methodological and empirical differences across the nine methods. In particular, certain methods assess pathway enrichment due to differences both across expression levels and in the strength of the interconnectedness of the members of the pathway, while others only leverage differential expression levels. In the more challenging setting involving a metabolomics data set, the results show that methods that utilize both pieces of information (with NetGSA being a prototypical one) exhibit superior statistical power in detecting pathway enrichment. CONCLUSION: The analysis reveals that a number of methods perform equally well when testing large size pathways, which is the case with genomic data. On the other hand, NetGSA that takes into consideration both differential expression of the biomolecules in the pathway, as well as changes in the topology exhibits a superior performance when testing small size pathways, which is usually the case for metabolomics data.


Asunto(s)
Genómica/métodos , Metabolómica/métodos , Biología Computacional/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA