Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 101
Filtrar
1.
IEEE Trans Biomed Circuits Syst ; 18(3): 523-538, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38157470

RESUMO

In this article, we introduce GEMA, a genome exact mapping accelerator based on learned indexes, specifically designed for FPGA implementation. GEMA utilizes a machine learning (ML) algorithm to precisely locate the exact position of read sequences within the original sequence. To enhance the accuracy of the trained ML model, we incorporate data augmentation and data-distribution-aware partitioning techniques. Additionally, we present an efficient yet low-overhead error recovery technique. To map long reads more efficiently, we propose a speculative prefetching approach, which reduces the required memory bandwidth. Furthermore, we suggest an FPGA-based architecture for implementing the proposed mapping accelerator, optimizing the accesses to off-chip memory. Our studies demonstrate that GEMA achieves up to 1.36 × higher speed for short reads compared to the corresponding results reported in recently published exact mapping accelerators. Moreover, GEMA achieves up to ∼22 × faster mapping of long reads compared to the available results for the longest mapped reads using these accelerators.


Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/instrumentação , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/instrumentação
2.
Nucleic Acids Res ; 51(W1): W207-W212, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37144459

RESUMO

g:Profiler is a reliable and up-to-date functional enrichment analysis tool that supports various evidence types, identifier types and organisms. The toolset integrates many databases, including Gene Ontology, KEGG and TRANSFAC, to provide a comprehensive and in-depth analysis of gene lists. It also provides interactive and intuitive user interfaces and supports ordered queries and custom statistical backgrounds, among other settings. g:Profiler provides multiple programmatic interfaces to access its functionality. These can be easily integrated into custom workflows and external tools, making them valuable resources for researchers who want to develop their own solutions. g:Profiler has been available since 2007 and is used to analyse millions of queries. Research reproducibility and transparency are achieved by maintaining working versions of all past database releases since 2015. g:Profiler supports 849 species, including vertebrates, plants, fungi, insects and parasites, and can analyse any organism through user-uploaded custom annotation files. In this update article, we introduce a novel filtering method highlighting Gene Ontology driver terms, accompanied by new graph visualizations providing a broader context for significant Gene Ontology terms. As a leading enrichment analysis and gene list interoperability service, g:Profiler offers a valuable resource for genetics, biology and medical researchers. It is freely accessible at https://biit.cs.ut.ee/gprofiler.


Assuntos
Mapeamento Cromossômico , Biologia Computacional , Genes , Software , Animais , Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Internet , Reprodutibilidade dos Testes , Interface Usuário-Computador , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Genes/genética , Humanos
3.
Plant Physiol ; 187(3): 1462-1480, 2021 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-34618057

RESUMO

Stomata are adjustable pores on leaf surfaces that regulate the tradeoff of CO2 uptake with water vapor loss, thus having critical roles in controlling photosynthetic carbon gain and plant water use. The lack of easy, rapid methods for phenotyping epidermal cell traits have limited discoveries about the genetic basis of stomatal patterning. A high-throughput epidermal cell phenotyping pipeline is presented here and used for quantitative trait loci (QTL) mapping in field-grown maize (Zea mays). The locations and sizes of stomatal complexes and pavement cells on images acquired by an optical topometer from mature leaves were automatically determined. Computer estimated stomatal complex density (SCD; R2 = 0.97) and stomatal complex area (SCA; R2 = 0.71) were strongly correlated with human measurements. Leaf gas exchange traits were genetically correlated with the dimensions and proportions of stomatal complexes (rg = 0.39-0.71) but did not correlate with SCD. Heritability of epidermal traits was moderate to high (h2 = 0.42-0.82) across two field seasons. Thirty-six QTL were consistently identified for a given trait in both years. Twenty-four clusters of overlapping QTL for multiple traits were identified, with univariate versus multivariate single marker analysis providing evidence consistent with pleiotropy in multiple cases. Putative orthologs of genes known to regulate stomatal patterning in Arabidopsis (Arabidopsis thaliana) were located within some, but not all, of these regions. This study demonstrates how discovery of the genetic basis for stomatal patterning can be accelerated in maize, a C4 model species where these processes are poorly understood.


Assuntos
Botânica/métodos , Mapeamento Cromossômico/instrumentação , Aprendizado de Máquina , Fenótipo , Estômatos de Plantas/fisiologia , Locos de Características Quantitativas , Zea mays/genética , Botânica/instrumentação , Genes de Plantas
4.
Methods ; 142: 47-58, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29723572

RESUMO

The 3D organization of eukaryotic chromosomes affects key processes such as gene expression, DNA replication, cell division, and response to DNA damage. The genome-wide chromosome conformation capture (Hi-C) approach can characterize the landscape of 3D genome organization by measuring interaction frequencies between all genomic regions. Hi-C protocol improvements and rapid advances in DNA sequencing power have made Hi-C useful to study diverse biological systems, not only to elucidate the role of 3D genome structure in proper cellular function, but also to characterize genomic rearrangements, assemble new genomes, and consider chromatin interactions as potential biomarkers for diseases. Yet, the Hi-C protocol is still complex and subject to variations at numerous steps that can affect the resulting data. Thus, there is still a need for better understanding and control of factors that contribute to Hi-C experiment success and data quality. Here, we evaluate recently proposed Hi-C protocol modifications as well as often overlooked variables in sample preparation and examine their effects on Hi-C data quality. We examine artifacts that can occur during Hi-C library preparation, including microhomology-based artificial template copying and chimera formation that can add noise to the downstream data. Exploring the mechanisms underlying Hi-C artifacts pinpoints steps that should be further optimized in the future. To improve the utility of Hi-C in characterizing the 3D genome of specialized populations of cells or small samples of primary tissue, we identify steps prone to DNA loss which should be considered to adapt Hi-C to lower cell numbers.


Assuntos
Cromatina/genética , Mapeamento Cromossômico/métodos , DNA/química , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Cromatina/química , Mapeamento Cromossômico/instrumentação , Reagentes de Ligações Cruzadas/química , Enzimas de Restrição do DNA/química , Conjuntos de Dados como Assunto , Formaldeído/química , Células Hep G2 , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos
5.
Methods ; 142: 89-99, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29684640

RESUMO

Assembly of reference-quality genomes from next-generation sequencing data is a key challenge in genomics. Recently, we and others have shown that Hi-C data can be used to address several outstanding challenges in the field of genome assembly. This principle has since been developed in academia and industry, and has been used in the assembly of several major genomes. In this paper, we explore the central principles underlying Hi-C-based assembly approaches, by quantitatively defining and characterizing three invariant Hi-C interaction patterns on which these approaches can build: Intrachromosomal interaction enrichment, distance-dependent interaction decay and local interaction smoothness. Specifically, we evaluate to what degree each invariant pattern holds on a single locus level in different species, cell types and Hi-C map resolutions. We find that these patterns are generally consistent across species and cell types but are affected by sequencing depth, and that matrix balancing improves consistency of loci with all three invariant patterns. Finally, we overview current Hi-C-based assembly approaches in light of these invariant patterns and demonstrate how local interaction smoothness can be used to easily detect scaffolding errors in extremely sparse Hi-C maps. We suggest that simultaneously considering all three invariant patterns may lead to better Hi-C-based genome assembly methods.


Assuntos
Mapeamento Cromossômico/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Modelos Genéticos , Anotação de Sequência Molecular/métodos , Animais , Mapeamento Cromossômico/instrumentação , DNA/química , DNA/genética , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Imageamento Tridimensional/instrumentação , Imageamento Tridimensional/métodos , Metagenômica/instrumentação , Modelos Estatísticos , Imagem Molecular/instrumentação , Imagem Molecular/métodos , Conformação de Ácido Nucleico , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos
6.
Methods ; 142: 30-38, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29408376

RESUMO

The spatial organization of chromosomes in the nuclear space is an extensively studied field that relies on measurements of structural features and 3D positions of chromosomes with high precision and robustness. However, no tools are currently available to image and analyze chromosome territories in a high-throughput format. Here, we have developed High-throughput Chromosome Territory Mapping (HiCTMap), a method for the robust and rapid analysis of 2D and 3D chromosome territory positioning in mammalian cells. HiCTMap is a high-throughput imaging-based chromosome detection method which enables routine analysis of chromosome structure and nuclear position. Using an optimized FISH staining protocol in a 384-well plate format in conjunction with a bespoke automated image analysis workflow, HiCTMap faithfully detects chromosome territories and their position in 2D and 3D in a large population of cells per experimental condition. We apply this novel technique to visualize chromosomes 18, X, and Y in male and female primary human skin fibroblasts, and show accurate detection of the correct number of chromosomes in the respective genotypes. Given the ability to visualize and quantitatively analyze large numbers of nuclei, we use HiCTMap to measure chromosome territory area and volume with high precision and determine the radial position of chromosome territories using either centroid or equidistant-shell analysis. The HiCTMap protocol is also compatible with RNA FISH as demonstrated by simultaneous labeling of X chromosomes and Xist RNA in female cells. We suggest HiCTMap will be a useful tool for routine precision mapping of chromosome territories in a wide range of cell types and tissues.


Assuntos
Mapeamento Cromossômico/métodos , Processamento de Imagem Assistida por Computador/métodos , Hibridização in Situ Fluorescente/métodos , Animais , Núcleo Celular/genética , Núcleo Celular/metabolismo , Mapeamento Cromossômico/instrumentação , Cromossomos Humanos Par 18/genética , Cromossomos Humanos Par 18/metabolismo , Cromossomos Humanos X/genética , Cromossomos Humanos X/metabolismo , Cromossomos Humanos Y/genética , Cromossomos Humanos Y/metabolismo , Feminino , Fibroblastos , Humanos , Processamento de Imagem Assistida por Computador/instrumentação , Hibridização in Situ Fluorescente/instrumentação , Masculino , Cultura Primária de Células/métodos , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Pele/citologia , Coloração e Rotulagem/instrumentação , Coloração e Rotulagem/métodos
7.
Methods ; 142: 59-73, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29382556

RESUMO

The folding and three-dimensional (3D) organization of chromatin in the nucleus critically impacts genome function. The past decade has witnessed rapid advances in genomic tools for delineating 3D genome architecture. Among them, chromosome conformation capture (3C)-based methods such as Hi-C are the most widely used techniques for mapping chromatin interactions. However, traditional Hi-C protocols rely on restriction enzymes (REs) to fragment chromatin and are therefore limited in resolution. We recently developed DNase Hi-C for mapping 3D genome organization, which uses DNase I for chromatin fragmentation. DNase Hi-C overcomes RE-related limitations associated with traditional Hi-C methods, leading to improved methodological resolution. Furthermore, combining this method with DNA capture technology provides a high-throughput approach (targeted DNase Hi-C) that allows for mapping fine-scale chromatin architecture at exceptionally high resolution. Hence, targeted DNase Hi-C will be valuable for delineating the physical landscapes of cis-regulatory networks that control gene expression and for characterizing phenotype-associated chromatin 3D signatures. Here, we provide a detailed description of method design and step-by-step working protocols for these two methods.


Assuntos
Mapeamento Cromossômico/métodos , Desoxirribonuclease I/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imageamento Tridimensional/métodos , Imagem Molecular/métodos , Técnicas de Cultura de Células/instrumentação , Técnicas de Cultura de Células/métodos , Núcleo Celular/genética , Núcleo Celular/metabolismo , Cromatina/química , Cromatina/genética , Mapeamento Cromossômico/instrumentação , Reagentes de Ligações Cruzadas/química , Enzimas de Restrição do DNA/química , Enzimas de Restrição do DNA/metabolismo , Desoxirribonuclease I/química , Formaldeído/química , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Imageamento Tridimensional/instrumentação , Imagem Molecular/instrumentação , Técnicas de Cultura de Tecidos/instrumentação , Técnicas de Cultura de Tecidos/métodos , Sequenciamento Completo do Genoma/instrumentação , Sequenciamento Completo do Genoma/métodos
8.
Nat Commun ; 9(1): 188, 2018 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-29335463

RESUMO

Topologically associating domains (TADs) are fundamental elements of the eukaryotic genomic structure. However, recent studies suggest that the insulating complexes, CTCF/cohesin, present at TAD borders in mammals are absent from those in Drosophila melanogaster, raising the possibility that border elements are not conserved among metazoans. Using in situ Hi-C with sub-kb resolution, here we show that the D. melanogaster genome is almost completely partitioned into >4000 TADs, nearly sevenfold more than previously identified. The overwhelming majority of these TADs are demarcated by the insulator complexes, BEAF-32/CP190, or BEAF-32/Chromator, indicating that these proteins may play an analogous role in flies as that of CTCF/cohesin in mammals. Moreover, extended regions previously thought to be unstructured are shown to consist of small contiguous TADs, a property also observed in mammals upon re-examination. Altogether, our work demonstrates that fundamental features associated with the higher-order folding of the genome are conserved from insects to mammals.


Assuntos
Cromatina/ultraestrutura , Mapeamento Cromossômico/métodos , Cromossomos de Insetos/ultraestrutura , Drosophila melanogaster/genética , Genoma de Inseto , Mamíferos/genética , Animais , Evolução Biológica , Fator de Ligação a CCCTC/genética , Fator de Ligação a CCCTC/metabolismo , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Cromatina/química , Montagem e Desmontagem da Cromatina , Proteínas Cromossômicas não Histona/genética , Proteínas Cromossômicas não Histona/metabolismo , Mapeamento Cromossômico/instrumentação , Cromossomos de Insetos/química , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/ultraestrutura , Proteínas do Olho/genética , Proteínas do Olho/metabolismo , Expressão Gênica , Humanos , Proteínas Associadas aos Microtúbulos/genética , Proteínas Associadas aos Microtúbulos/metabolismo , Conformação Molecular , Proteínas Associadas à Matriz Nuclear/genética , Proteínas Associadas à Matriz Nuclear/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Coesinas
9.
Cold Spring Harb Protoc ; 2018(2)2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-28733394

RESUMO

This protocol describes an optimized high-throughput procedure for generating double deletion mutants in Schizosaccharomyces pombe using the colony replicating robot ROTOR HDA and the PEM (pombe epistasis mapper) system. The method is based on generating high-density colony arrays (1536 colonies per agar plate) and passaging them through a series of antidiploid and mating-type selection (ADS-MTS) and double-mutant selection (DMS) steps. Detailed program parameters for each individual replication step are provided. Using this procedure, batches of 25 or more screens can be routinely performed.


Assuntos
Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Epistasia Genética , Genes Fúngicos , Genética Microbiana/instrumentação , Genética Microbiana/métodos , Schizosaccharomyces/genética , Deleção de Genes , Ensaios de Triagem em Larga Escala/instrumentação , Ensaios de Triagem em Larga Escala/métodos , Robótica/instrumentação , Robótica/métodos , Schizosaccharomyces/crescimento & desenvolvimento , Seleção Genética
10.
Cold Spring Harb Protoc ; 2018(2)2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-28733416

RESUMO

In laboratories in which a colony-replicating robot is not available, manual replication provides a good, low-cost alternative for genetic interaction screening using the Pombe Epistasis Mapper (PEM) system. The protocol presented here describes the minimum number of steps required to identify genetic interactions. First, a query deletion is introduced to a library of deletion mutants by mating. Through a series of subsequent selection steps, single and double mutants are isolated and analyzed.


Assuntos
Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Epistasia Genética , Genes Fúngicos , Genética Microbiana/instrumentação , Genética Microbiana/métodos , Schizosaccharomyces/genética , Mutação , Seleção Genética
11.
Nat Commun ; 8(1): 1826, 2017 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-29184056

RESUMO

A main challenge in genome-wide association studies (GWAS) is to pinpoint possible causal variants. Results from GWAS typically do not directly translate into causal variants because the majority of hits are in non-coding or intergenic regions, and the presence of linkage disequilibrium leads to effects being statistically spread out across multiple variants. Post-GWAS annotation facilitates the selection of most likely causal variant(s). Multiple resources are available for post-GWAS annotation, yet these can be time consuming and do not provide integrated visual aids for data interpretation. We, therefore, develop FUMA: an integrative web-based platform using information from multiple biological resources to facilitate functional annotation of GWAS results, gene prioritization and interactive visualization. FUMA accommodates positional, expression quantitative trait loci (eQTL) and chromatin interaction mappings, and provides gene-based, pathway and tissue enrichment results. FUMA results directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.


Assuntos
Bases de Dados Genéticas , Estudo de Associação Genômica Ampla/instrumentação , Estudo de Associação Genômica Ampla/métodos , Desequilíbrio de Ligação , Cromatina/genética , Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Doença de Crohn/genética , Predisposição Genética para Doença , Genoma Humano , Humanos , Internet , Anotação de Sequência Molecular/métodos , Locos de Características Quantitativas
12.
Cell Rep ; 21(1): 289-300, 2017 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-28978481

RESUMO

Protein-DNA interactions provide the basis for chromatin structure and gene regulation. Comprehensive identification of protein-occupied sites is thus vital to an in-depth understanding of genome function. Dimethyl sulfate (DMS) is a chemical probe that has long been used to detect footprints of DNA-bound proteins in vitro and in vivo. Here, we describe a genomic footprinting method, dimethyl sulfate sequencing (DMS-seq), which exploits the cell-permeable nature of DMS to obviate the need for nuclear isolation. This feature makes DMS-seq simple in practice and removes the potential risk of protein re-localization during nuclear isolation. DMS-seq successfully detects transcription factors bound to cis-regulatory elements and non-canonical chromatin particles in nucleosome-free regions. Furthermore, an unexpected preference of DMS confers on DMS-seq a unique potential to directly detect nucleosome centers without using genetic manipulation. We expect that DMS-seq will serve as a characteristic method for genome-wide interrogation of in vivo protein-DNA interactions.


Assuntos
Mapeamento Cromossômico/métodos , Pegada de DNA/métodos , Proteínas de Ligação a DNA/genética , Genoma Humano , Nucleossomos/química , Ésteres do Ácido Sulfúrico/química , Linhagem Celular , Mapeamento Cromossômico/instrumentação , DNA/genética , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Regulação da Expressão Gênica , Biblioteca Gênica , Loci Gênicos , Hepatócitos/citologia , Hepatócitos/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Histonas/genética , Histonas/metabolismo , Humanos , Nucleossomos/metabolismo , Sequências Reguladoras de Ácido Nucleico , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/metabolismo , Análise de Sequência de DNA
13.
Nat Biotechnol ; 35(7): 640-646, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28553940

RESUMO

The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.


Assuntos
Mapeamento Cromossômico/instrumentação , Código de Barras de DNA Taxonômico/instrumentação , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Dispositivos Lab-On-A-Chip , Análise Serial de Tecidos/instrumentação , Separação Celular/instrumentação , Desenho de Equipamento , Análise de Falha de Equipamento
14.
Lab Chip ; 17(4): 579-590, 2017 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-28098301

RESUMO

Optical DNA mapping has over the last decade emerged as a very powerful tool for obtaining long range sequence information from single DNA molecules. In optical DNA mapping, intact large single DNA molecules are labeled, stretched out, and imaged using a fluorescence microscope. This means that sequence information ranging over hundreds of kilobasepairs (kbp) can be obtained in one single image. Nanochannels offer homogeneous and efficient stretching of DNA that is crucial to maximize the information that can be obtained from optical DNA maps. In this review, we highlight progress in the field of optical DNA mapping in nanochannels. We discuss the different protocols for sequence specific labeling and divide them into two main categories, enzymatic labeling and affinity-based labeling. Examples are highlighted where optical DNA mapping is used to gain information on length scales that would be inaccessible with traditional techniques. Enzymatic labeling has been commercialized and is mainly used in human genetics and assembly of complex genomes, while the affinity-based methods have primarily been applied in bacteriology, for example for rapid analysis of plasmids encoding antibiotic resistance. Next, we highlight how the design of nanofluidic channels can been altered in order to obtain the desired information and discuss how recent advances in the field make it possible to retrieve information beyond DNA sequence. In the outlook section, we discuss future directions of optical DNA mapping, such as fully integrated devices and portable microscopes.


Assuntos
Mapeamento Cromossômico , DNA , Técnicas Analíticas Microfluídicas/instrumentação , Microscopia de Fluorescência/instrumentação , Nanotecnologia/instrumentação , Animais , Química Encefálica , Linhagem Celular , Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , DNA/análise , DNA/química , DNA/genética , Humanos , Camundongos
15.
Adv Exp Med Biol ; 926: 1-10, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27686802

RESUMO

Proteogenomic strategies aim to refine genome-wide annotations of protein coding features by using actual protein level observations. Most of the currently applied proteogenomic approaches include integrative analysis of multiple types of high-throughput omics data, e.g., genomics, transcriptomics, proteomics, etc. Recent efforts towards creating a human proteome map were primarily targeted to experimentally detect at least one protein product for each gene in the genome and extensively utilized proteogenomic approaches. The 14 year long wait to get a draft human proteome map, after completion of similar efforts to sequence the genome, explains the huge complexity and technical hurdles of such efforts. Further, the integrative analysis of large-scale multi-omics datasets inherent to these studies becomes a major bottleneck to their success. However, recent developments of various analysis tools and pipelines dedicated to proteogenomics reduce both the time and complexity of such analysis. Here, we summarize notable approaches, studies, software developments and their potential applications towards eukaryotic genome annotation and clinical proteogenomics.


Assuntos
Mapeamento Cromossômico/métodos , Genoma , Fases de Leitura Aberta , Proteogenômica/métodos , Software/provisão & distribuição , Animais , Mapeamento Cromossômico/instrumentação , Conjuntos de Dados como Assunto , Células Eucarióticas/metabolismo , Humanos , Anotação de Sequência Molecular , Proteogenômica/instrumentação , Proteoma
16.
BMC Bioinformatics ; 17(1): 198, 2016 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-27143038

RESUMO

BACKGROUND: Gene expression connectivity mapping has proven to be a powerful and flexible tool for research. Its application has been shown in a broad range of research topics, most commonly as a means of identifying potential small molecule compounds, which may be further investigated as candidates for repurposing to treat diseases. The public release of voluminous data from the Library of Integrated Cellular Signatures (LINCS) programme further enhanced the utilities and potentials of gene expression connectivity mapping in biomedicine. RESULTS: We describe QUADrATiC ( http://go.qub.ac.uk/QUADrATiC ), a user-friendly tool for the exploration of gene expression connectivity on the subset of the LINCS data set corresponding to FDA-approved small molecule compounds. It enables the identification of compounds for repurposing therapeutic potentials. The software is designed to cope with the increased volume of data over existing tools, by taking advantage of multicore computing architectures to provide a scalable solution, which may be installed and operated on a range of computers, from laptops to servers. This scalability is provided by the use of the modern concurrent programming paradigm provided by the Akka framework. The QUADrATiC Graphical User Interface (GUI) has been developed using advanced Javascript frameworks, providing novel visualization capabilities for further analysis of connections. There is also a web services interface, allowing integration with other programs or scripts. CONCLUSIONS: QUADrATiC has been shown to provide an improvement over existing connectivity map software, in terms of scope (based on the LINCS data set), applicability (using FDA-approved compounds), usability and speed. It offers potential to biological researchers to analyze transcriptional data and generate potential therapeutics for focussed study in the lab. QUADrATiC represents a step change in the process of investigating gene expression connectivity and provides more biologically-relevant results than previous alternative solutions.


Assuntos
Mapeamento Cromossômico/métodos , Tratamento Farmacológico , Mapeamento Cromossômico/instrumentação , Expressão Gênica , Humanos , Bibliotecas de Moléculas Pequenas/farmacologia , Software , Estados Unidos , United States Food and Drug Administration , Interface Usuário-Computador
17.
Artigo em Inglês | MEDLINE | ID: mdl-26451812

RESUMO

While the sequencing capability of modern instruments continues to increase exponentially, the computational problem of mapping short sequenced reads to a reference genome still constitutes a bottleneck in the analysis pipeline. A variety of mapping tools (e.g., Bowtie, BWA) is available for general-purpose computer architectures. These tools can take many hours or even days to deliver mapping results, depending on the number of input reads, the size of the reference genome and the number of allowed mismatches or insertion/deletions, making the mapping problem an ideal candidate for hardware acceleration. In this paper, we present FHAST (FPGA hardware accelerated sequence-matching tool), a drop-in replacement for Bowtie that uses a hardware design based on field programmable gate arrays (FPGA). Our architecture masks memory latency by executing multiple concurrent hardware threads accessing memory simultaneously. FHAST is composed by multiple parallel engines to exploit the parallelism available to us on an FPGA. We have implemented and tested FHAST on the Convey HC-1 and later ported on the Convey HC-2ex, taking advantage of the large memory bandwidth available to these systems and the shared memory image between hardware and software. A preliminary version of FHAST running on the Convey HC-1 achieved up to 70x speedup compared to Bowtie (single-threaded). An improved version of FHAST running on the Convey HC-2ex FPGAs achieved up to 12x fold speed gain compared to Bowtie running eight threads on an eight-core conventional architecture, while maintaining almost identical mapping accuracy. FHAST is a drop-in replacement for Bowtie, so it can be incorporated in any analysis pipeline that uses Bowtie (e.g., TopHat).


Assuntos
Mapeamento Cromossômico/instrumentação , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Análise de Sequência de DNA/instrumentação , Processamento de Sinais Assistido por Computador/instrumentação , Software , Mapeamento Cromossômico/métodos , Desenho de Equipamento , Análise de Falha de Equipamento , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
18.
Artigo em Inglês | MEDLINE | ID: mdl-26451814

RESUMO

We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.


Assuntos
Mapeamento Cromossômico/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , RNA/genética , Análise de Sequência de RNA/instrumentação , Processamento de Sinais Assistido por Computador/instrumentação , Software , Sequência de Bases , Mapeamento Cromossômico/métodos , Desenho de Equipamento , Análise de Falha de Equipamento , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Alinhamento de Sequência/instrumentação , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos
19.
Artigo em Inglês | MEDLINE | ID: mdl-26451813

RESUMO

High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively long runtimes; e.g., processing a moderately-sized dataset consisting of about 500,000 SNPs and 5,000 samples requires several days using state-of-the-art tools on a standard 3 GHz CPU. In this paper, we demonstrate how this task can be accelerated using a combination of fine-grained and coarse-grained parallelism on two different computing systems. The first architecture is based on reconfigurable hardware (FPGAs) while the second architecture uses multiple GPUs connected to the same host. We show that both systems can achieve speedups of around four orders-of-magnitude compared to the sequential implementation. This significantly reduces the runtimes for detecting epistasis to only a few minutes for moderately-sized datasets and to a few hours for large-scale datasets.


Assuntos
Gráficos por Computador/instrumentação , Análise Mutacional de DNA/instrumentação , Epistasia Genética/genética , Estudo de Associação Genômica Ampla/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Polimorfismo de Nucleotídeo Único/genética , Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Desenho de Equipamento , Análise de Falha de Equipamento , Estudo de Associação Genômica Ampla/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador/instrumentação
20.
Artigo em Inglês | MEDLINE | ID: mdl-26451815

RESUMO

Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Mapeamento Cromossômico/instrumentação , Ensaios de Triagem em Larga Escala/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Mapeamento de Interação de Proteínas/instrumentação , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Ensaios de Triagem em Larga Escala/métodos , Transdução de Sinais/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA