Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 21(5): 1523-1530, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31624847

RESUMO

The generation and systematic collection of genome-wide data is ever-increasing. This vast amount of data has enabled researchers to study relations between a variety of genomic and epigenomic features, including genetic variation, gene regulation and phenotypic traits. Such relations are typically investigated by comparatively assessing genomic co-occurrence. Technically, this corresponds to assessing the similarity of pairs of genome-wide binary vectors. A variety of similarity measures have been proposed for this problem in other fields like ecology. However, while several of these measures have been employed for assessing genomic co-occurrence, their appropriateness for the genomic setting has never been investigated. We show that the choice of similarity measure may strongly influence results and propose two alternative modelling assumptions that can be used to guide this choice. On both simulated and real genomic data, the Jaccard index is strongly altered by dataset size and should be used with caution. The Forbes coefficient (fold change) and tetrachoric correlation are less influenced by dataset size, but one should be aware of increased variance for small datasets. All results on simulated and real data can be inspected and reproduced at https://hyperbrowser.uio.no/sim-measure.


Assuntos
Genômica/métodos , Algoritmos , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Variação Genética , Humanos
2.
BMC Genomics ; 21(1): 282, 2020 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-32252628

RESUMO

BACKGROUND: Graph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known. Two recent papers show that mapping to graph-based reference genomes can improve accuracy as compared to methods using linear references. Both of these methods index the sequences for most paths up to a certain length in the graph in order to enable direct mapping of reads containing common variants. However, the combinatorial explosion of possible paths through nearby variants also leads to a huge search space and an increased chance of false positive alignments to highly variable regions. RESULTS: We here assess three prominent graph-based read mappers against a hybrid baseline approach that combines an initial path determination with a tuned linear read mapping method. We show, using a previously proposed benchmark, that this simple approach is able to improve overall accuracy of read-mapping to graph-based reference genomes. CONCLUSIONS: Our method is implemented in a tool Two-step Graph Mapper, which is available at https://github.com/uio-bmi/two_step_graph_mapperalong with data and scripts for reproducing the experiments. Our method highlights characteristics of the current generation of graph-based read mappers and shows potential for improvement for future graph-based read mappers.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Alinhamento de Sequência
3.
PLoS Comput Biol ; 15(2): e1006731, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30779737

RESUMO

Graph-based representations are considered to be the future for reference genomes, as they allow integrated representation of the steadily increasing data on individual variation. Currently available tools allow de novo assembly of graph-based reference genomes, alignment of new read sets to the graph representation as well as certain analyses like variant calling and haplotyping. We here present a first method for calling ChIP-Seq peaks on read data aligned to a graph-based reference genome. The method is a graph generalization of the peak caller MACS2, and is implemented in an open source tool, Graph Peak Caller. By using the existing tool vg to build a pan-genome of Arabidopsis thaliana, we validate our approach by showing that Graph Peak Caller with a pan-genome reference graph can trace variants within peaks that are not part of the linear reference genome, and find peaks that in general are more motif-enriched than those found by MACS2.


Assuntos
Imunoprecipitação da Cromatina/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Arabidopsis/genética , Genoma/genética , Ligação Proteica , Software , Fatores de Transcrição
4.
BMC Bioinformatics ; 18(1): 263, 2017 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-28521770

RESUMO

BACKGROUND: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. RESULTS: We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. CONCLUSION: More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .


Assuntos
Gráficos por Computador , Genoma Humano , Genômica/métodos , Algoritmos , Loci Gênicos , Humanos , Internet , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de DNA , Software
5.
Bioinformatics ; 32(11): 1743-5, 2016 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-26819474

RESUMO

UNLABELLED: : We present Galaxy Portal app, an open source interface to the Galaxy system through smart phones and tablets. The Galaxy Portal provides convenient and efficient monitoring of job completion, as well as opportunities for inspection of results and execution history. In addition to being useful to the Galaxy community, we believe that the app also exemplifies a useful way of exploiting mobile interfaces for research/high-performance computing resources in general. AVAILABILITY AND IMPLEMENTATION: The source is freely available under a GPL license on GitHub, along with user documentation and pre-compiled binaries and instructions for several platforms: https://github.com/Tarostar/QMLGalaxyPortal It is available for iOS version 7 (and newer) through the Apple App Store, and for Android through Google Play for version 4.1 (API 16) or newer. CONTACT: geirksa@ifi.uio.no.


Assuntos
Aplicativos Móveis , Software
7.
Genome Biol ; 23(1): 209, 2022 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-36195962

RESUMO

Genotyping is a core application of high-throughput sequencing. We present KAGE, a genotyper for SNPs and short indels that is inspired by recent developments within graph-based genome representations and alignment-free methods. KAGE uses a pan-genome representation of the population to efficiently and accurately predict genotypes. Two novel ideas improve both the speed and accuracy: a Bayesian model incorporates genotypes from thousands of individuals to improve prediction accuracy, and a computationally efficient method leverages correlation between variants. We show that the accuracy of KAGE is at par with the best existing alignment-free genotypers, while being an order of magnitude faster.


Assuntos
Mutação INDEL , Polimorfismo de Nucleotídeo Único , Algoritmos , Teorema de Bayes , Genoma Humano , Genótipo , Técnicas de Genotipagem , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
8.
Nat Mach Intell ; 3(11): 936-944, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37396030

RESUMO

Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel deep learning method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.

9.
Cells ; 9(5)2020 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-32414201

RESUMO

The cholesterol-sensing nuclear receptor liver X receptor (LXR) and the glucose-sensing transcription factor carbohydrate responsive element-binding protein (ChREBP) are central players in regulating glucose and lipid metabolism in the liver. More knowledge of their mechanistic interplay is needed to understand their role in pathological conditions like fatty liver disease and insulin resistance. In the current study, LXR and ChREBP co-occupancy was examined by analyzing ChIP-seq datasets from mice livers. LXR and ChREBP interaction was determined by Co-immunoprecipitation (CoIP) and their transactivity was assessed by real-time quantitative polymerase chain reaction (qPCR) of target genes and gene reporter assays. Chromatin binding capacity was determined by ChIP-qPCR assays. Our data show that LXRα and ChREBPα interact physically and show a high co-occupancy at regulatory regions in the mouse genome. LXRα co-activates ChREBPα and regulates ChREBP-specific target genes in vitro and in vivo. This co-activation is dependent on functional recognition elements for ChREBP but not for LXR, indicating that ChREBPα recruits LXRα to chromatin in trans. The two factors interact via their key activation domains; the low glucose inhibitory domain (LID) of ChREBPα and the ligand-binding domain (LBD) of LXRα. While unliganded LXRα co-activates ChREBPα, ligand-bound LXRα surprisingly represses ChREBPα activity on ChREBP-specific target genes. Mechanistically, this is due to a destabilized LXRα:ChREBPα interaction, leading to reduced ChREBP-binding to chromatin and restricted activation of glycolytic and lipogenic target genes. This ligand-driven molecular switch highlights an unappreciated role of LXRα in responding to nutritional cues that was overlooked due to LXR lipogenesis-promoting function.


Assuntos
Fatores de Transcrição de Zíper de Leucina e Hélice-Alça-Hélix Básicos/agonistas , Fatores de Transcrição de Zíper de Leucina e Hélice-Alça-Hélix Básicos/metabolismo , Receptores X do Fígado/agonistas , Receptores X do Fígado/metabolismo , Ativação Transcricional/genética , Animais , Fatores de Transcrição de Zíper de Leucina e Hélice-Alça-Hélix Básicos/química , Linhagem Celular Tumoral , Cromatina/metabolismo , Feminino , Genoma , Humanos , Ligantes , Fígado/metabolismo , Receptores X do Fígado/química , Masculino , Camundongos Endogâmicos C57BL , Modelos Biológicos , Ligação Proteica , Domínios Proteicos , Elementos de Resposta/genética
10.
Gigascience ; 6(7): 1-12, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-28459977

RESUMO

Background: Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. Findings: We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Conclusions: Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.


Assuntos
Conjuntos de Dados como Assunto/normas , Epigênese Genética , Epigenômica/métodos , Genoma Humano , Software , Sequenciamento Completo do Genoma/métodos , Epigenômica/normas , Humanos , Sequenciamento Completo do Genoma/normas
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa