Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros




Base de datos
Intervalo de año de publicación
1.
Comput Biol Chem ; 80: 152-158, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30959271

RESUMEN

There exists over 2.5 million publicly available gene expression samples across 101,000 data series in NCBI's Gene Expression Omnibus (GEO) database. Due to the lack of the use of standardised ontology terms in GEO's free text metadata to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention. In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network. GEOracle is effective in discovering perturbation gene targets in GEO by harnessing its free text metadata. Its effectiveness and applicability has been demonstrated by cross validation and two real-life case studies. It opens up new avenues to unlock the gene regulatory information embedded inside large biological databases such as GEO. GEOracle is available at https://github.com/VCCRI/GEOracle.


Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Bases de Datos Genéticas/estadística & datos numéricos , Expresión Génica , Metadatos , Animales , Biología Computacional/instrumentación , Ontología de Genes , Redes Reguladoras de Genes , Humanos , Aprendizaje Automático , Ratones , Transducción de Señal , Programas Informáticos , Factor de Crecimiento Transformador beta/metabolismo
2.
F1000Res ; 8: 1587, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32913631

RESUMEN

Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA