Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 102
Filtrar
1.
IEEE Trans Biomed Circuits Syst ; 18(3): 523-538, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38157470

RESUMEN

In this article, we introduce GEMA, a genome exact mapping accelerator based on learned indexes, specifically designed for FPGA implementation. GEMA utilizes a machine learning (ML) algorithm to precisely locate the exact position of read sequences within the original sequence. To enhance the accuracy of the trained ML model, we incorporate data augmentation and data-distribution-aware partitioning techniques. Additionally, we present an efficient yet low-overhead error recovery technique. To map long reads more efficiently, we propose a speculative prefetching approach, which reduces the required memory bandwidth. Furthermore, we suggest an FPGA-based architecture for implementing the proposed mapping accelerator, optimizing the accesses to off-chip memory. Our studies demonstrate that GEMA achieves up to 1.36 × higher speed for short reads compared to the corresponding results reported in recently published exact mapping accelerators. Moreover, GEMA achieves up to ∼22 × faster mapping of long reads compared to the available results for the longest mapped reads using these accelerators.


Asunto(s)
Algoritmos , Aprendizaje Automático , Humanos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/instrumentación , Mapeo Cromosómico/métodos , Mapeo Cromosómico/instrumentación
2.
Nucleic Acids Res ; 51(W1): W207-W212, 2023 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-37144459

RESUMEN

g:Profiler is a reliable and up-to-date functional enrichment analysis tool that supports various evidence types, identifier types and organisms. The toolset integrates many databases, including Gene Ontology, KEGG and TRANSFAC, to provide a comprehensive and in-depth analysis of gene lists. It also provides interactive and intuitive user interfaces and supports ordered queries and custom statistical backgrounds, among other settings. g:Profiler provides multiple programmatic interfaces to access its functionality. These can be easily integrated into custom workflows and external tools, making them valuable resources for researchers who want to develop their own solutions. g:Profiler has been available since 2007 and is used to analyse millions of queries. Research reproducibility and transparency are achieved by maintaining working versions of all past database releases since 2015. g:Profiler supports 849 species, including vertebrates, plants, fungi, insects and parasites, and can analyse any organism through user-uploaded custom annotation files. In this update article, we introduce a novel filtering method highlighting Gene Ontology driver terms, accompanied by new graph visualizations providing a broader context for significant Gene Ontology terms. As a leading enrichment analysis and gene list interoperability service, g:Profiler offers a valuable resource for genetics, biology and medical researchers. It is freely accessible at https://biit.cs.ut.ee/gprofiler.


Asunto(s)
Mapeo Cromosómico , Biología Computacional , Genes , Programas Informáticos , Animales , Mapeo Cromosómico/instrumentación , Mapeo Cromosómico/métodos , Bases de Datos Genéticas , Internet , Reproducibilidad de los Resultados , Interfaz Usuario-Computador , Biología Computacional/instrumentación , Biología Computacional/métodos , Genes/genética , Humanos
3.
Plant Physiol ; 187(3): 1462-1480, 2021 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-34618057

RESUMEN

Stomata are adjustable pores on leaf surfaces that regulate the tradeoff of CO2 uptake with water vapor loss, thus having critical roles in controlling photosynthetic carbon gain and plant water use. The lack of easy, rapid methods for phenotyping epidermal cell traits have limited discoveries about the genetic basis of stomatal patterning. A high-throughput epidermal cell phenotyping pipeline is presented here and used for quantitative trait loci (QTL) mapping in field-grown maize (Zea mays). The locations and sizes of stomatal complexes and pavement cells on images acquired by an optical topometer from mature leaves were automatically determined. Computer estimated stomatal complex density (SCD; R2 = 0.97) and stomatal complex area (SCA; R2 = 0.71) were strongly correlated with human measurements. Leaf gas exchange traits were genetically correlated with the dimensions and proportions of stomatal complexes (rg = 0.39-0.71) but did not correlate with SCD. Heritability of epidermal traits was moderate to high (h2 = 0.42-0.82) across two field seasons. Thirty-six QTL were consistently identified for a given trait in both years. Twenty-four clusters of overlapping QTL for multiple traits were identified, with univariate versus multivariate single marker analysis providing evidence consistent with pleiotropy in multiple cases. Putative orthologs of genes known to regulate stomatal patterning in Arabidopsis (Arabidopsis thaliana) were located within some, but not all, of these regions. This study demonstrates how discovery of the genetic basis for stomatal patterning can be accelerated in maize, a C4 model species where these processes are poorly understood.


Asunto(s)
Botánica/métodos , Mapeo Cromosómico/instrumentación , Aprendizaje Automático , Fenotipo , Estomas de Plantas/fisiología , Sitios de Carácter Cuantitativo , Zea mays/genética , Botánica/instrumentación , Genes de Plantas
4.
Methods ; 142: 47-58, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29723572

RESUMEN

The 3D organization of eukaryotic chromosomes affects key processes such as gene expression, DNA replication, cell division, and response to DNA damage. The genome-wide chromosome conformation capture (Hi-C) approach can characterize the landscape of 3D genome organization by measuring interaction frequencies between all genomic regions. Hi-C protocol improvements and rapid advances in DNA sequencing power have made Hi-C useful to study diverse biological systems, not only to elucidate the role of 3D genome structure in proper cellular function, but also to characterize genomic rearrangements, assemble new genomes, and consider chromatin interactions as potential biomarkers for diseases. Yet, the Hi-C protocol is still complex and subject to variations at numerous steps that can affect the resulting data. Thus, there is still a need for better understanding and control of factors that contribute to Hi-C experiment success and data quality. Here, we evaluate recently proposed Hi-C protocol modifications as well as often overlooked variables in sample preparation and examine their effects on Hi-C data quality. We examine artifacts that can occur during Hi-C library preparation, including microhomology-based artificial template copying and chimera formation that can add noise to the downstream data. Exploring the mechanisms underlying Hi-C artifacts pinpoints steps that should be further optimized in the future. To improve the utility of Hi-C in characterizing the 3D genome of specialized populations of cells or small samples of primary tissue, we identify steps prone to DNA loss which should be considered to adapt Hi-C to lower cell numbers.


Asunto(s)
Cromatina/genética , Mapeo Cromosómico/métodos , ADN/química , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Cromatina/química , Mapeo Cromosómico/instrumentación , Reactivos de Enlaces Cruzados/química , Enzimas de Restricción del ADN/química , Conjuntos de Datos como Asunto , Formaldehído/química , Células Hep G2 , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Humanos , Análisis de Secuencia de ADN/instrumentación , Análisis de Secuencia de ADN/métodos
5.
Methods ; 142: 89-99, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29684640

RESUMEN

Assembly of reference-quality genomes from next-generation sequencing data is a key challenge in genomics. Recently, we and others have shown that Hi-C data can be used to address several outstanding challenges in the field of genome assembly. This principle has since been developed in academia and industry, and has been used in the assembly of several major genomes. In this paper, we explore the central principles underlying Hi-C-based assembly approaches, by quantitatively defining and characterizing three invariant Hi-C interaction patterns on which these approaches can build: Intrachromosomal interaction enrichment, distance-dependent interaction decay and local interaction smoothness. Specifically, we evaluate to what degree each invariant pattern holds on a single locus level in different species, cell types and Hi-C map resolutions. We find that these patterns are generally consistent across species and cell types but are affected by sequencing depth, and that matrix balancing improves consistency of loci with all three invariant patterns. Finally, we overview current Hi-C-based assembly approaches in light of these invariant patterns and demonstrate how local interaction smoothness can be used to easily detect scaffolding errors in extremely sparse Hi-C maps. We suggest that simultaneously considering all three invariant patterns may lead to better Hi-C-based genome assembly methods.


Asunto(s)
Mapeo Cromosómico/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Modelos Genéticos , Anotación de Secuencia Molecular/métodos , Animales , Mapeo Cromosómico/instrumentación , ADN/química , ADN/genética , Genoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Humanos , Imagenología Tridimensional/instrumentación , Imagenología Tridimensional/métodos , Metagenómica/instrumentación , Modelos Estadísticos , Imagen Molecular/instrumentación , Imagen Molecular/métodos , Conformación de Ácido Nucleico , Análisis de Secuencia de ADN/instrumentación , Análisis de Secuencia de ADN/métodos
6.
Methods ; 142: 59-73, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29382556

RESUMEN

The folding and three-dimensional (3D) organization of chromatin in the nucleus critically impacts genome function. The past decade has witnessed rapid advances in genomic tools for delineating 3D genome architecture. Among them, chromosome conformation capture (3C)-based methods such as Hi-C are the most widely used techniques for mapping chromatin interactions. However, traditional Hi-C protocols rely on restriction enzymes (REs) to fragment chromatin and are therefore limited in resolution. We recently developed DNase Hi-C for mapping 3D genome organization, which uses DNase I for chromatin fragmentation. DNase Hi-C overcomes RE-related limitations associated with traditional Hi-C methods, leading to improved methodological resolution. Furthermore, combining this method with DNA capture technology provides a high-throughput approach (targeted DNase Hi-C) that allows for mapping fine-scale chromatin architecture at exceptionally high resolution. Hence, targeted DNase Hi-C will be valuable for delineating the physical landscapes of cis-regulatory networks that control gene expression and for characterizing phenotype-associated chromatin 3D signatures. Here, we provide a detailed description of method design and step-by-step working protocols for these two methods.


Asunto(s)
Mapeo Cromosómico/métodos , Desoxirribonucleasa I/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Imagenología Tridimensional/métodos , Imagen Molecular/métodos , Técnicas de Cultivo de Célula/instrumentación , Técnicas de Cultivo de Célula/métodos , Núcleo Celular/genética , Núcleo Celular/metabolismo , Cromatina/química , Cromatina/genética , Mapeo Cromosómico/instrumentación , Reactivos de Enlaces Cruzados/química , Enzimas de Restricción del ADN/química , Enzimas de Restricción del ADN/metabolismo , Desoxirribonucleasa I/química , Formaldehído/química , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Imagenología Tridimensional/instrumentación , Imagen Molecular/instrumentación , Técnicas de Cultivo de Tejidos/instrumentación , Técnicas de Cultivo de Tejidos/métodos , Secuenciación Completa del Genoma/instrumentación , Secuenciación Completa del Genoma/métodos
7.
Methods ; 142: 30-38, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29408376

RESUMEN

The spatial organization of chromosomes in the nuclear space is an extensively studied field that relies on measurements of structural features and 3D positions of chromosomes with high precision and robustness. However, no tools are currently available to image and analyze chromosome territories in a high-throughput format. Here, we have developed High-throughput Chromosome Territory Mapping (HiCTMap), a method for the robust and rapid analysis of 2D and 3D chromosome territory positioning in mammalian cells. HiCTMap is a high-throughput imaging-based chromosome detection method which enables routine analysis of chromosome structure and nuclear position. Using an optimized FISH staining protocol in a 384-well plate format in conjunction with a bespoke automated image analysis workflow, HiCTMap faithfully detects chromosome territories and their position in 2D and 3D in a large population of cells per experimental condition. We apply this novel technique to visualize chromosomes 18, X, and Y in male and female primary human skin fibroblasts, and show accurate detection of the correct number of chromosomes in the respective genotypes. Given the ability to visualize and quantitatively analyze large numbers of nuclei, we use HiCTMap to measure chromosome territory area and volume with high precision and determine the radial position of chromosome territories using either centroid or equidistant-shell analysis. The HiCTMap protocol is also compatible with RNA FISH as demonstrated by simultaneous labeling of X chromosomes and Xist RNA in female cells. We suggest HiCTMap will be a useful tool for routine precision mapping of chromosome territories in a wide range of cell types and tissues.


Asunto(s)
Mapeo Cromosómico/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Hibridación Fluorescente in Situ/métodos , Animales , Núcleo Celular/genética , Núcleo Celular/metabolismo , Mapeo Cromosómico/instrumentación , Cromosomas Humanos Par 18/genética , Cromosomas Humanos Par 18/metabolismo , Cromosomas Humanos X/genética , Cromosomas Humanos X/metabolismo , Cromosomas Humanos Y/genética , Cromosomas Humanos Y/metabolismo , Femenino , Fibroblastos , Humanos , Procesamiento de Imagen Asistido por Computador/instrumentación , Hibridación Fluorescente in Situ/instrumentación , Masculino , Cultivo Primario de Células/métodos , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Piel/citología , Coloración y Etiquetado/instrumentación , Coloración y Etiquetado/métodos
8.
Nat Commun ; 9(1): 188, 2018 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-29335463

RESUMEN

Topologically associating domains (TADs) are fundamental elements of the eukaryotic genomic structure. However, recent studies suggest that the insulating complexes, CTCF/cohesin, present at TAD borders in mammals are absent from those in Drosophila melanogaster, raising the possibility that border elements are not conserved among metazoans. Using in situ Hi-C with sub-kb resolution, here we show that the D. melanogaster genome is almost completely partitioned into >4000 TADs, nearly sevenfold more than previously identified. The overwhelming majority of these TADs are demarcated by the insulator complexes, BEAF-32/CP190, or BEAF-32/Chromator, indicating that these proteins may play an analogous role in flies as that of CTCF/cohesin in mammals. Moreover, extended regions previously thought to be unstructured are shown to consist of small contiguous TADs, a property also observed in mammals upon re-examination. Altogether, our work demonstrates that fundamental features associated with the higher-order folding of the genome are conserved from insects to mammals.


Asunto(s)
Cromatina/ultraestructura , Mapeo Cromosómico/métodos , Cromosomas de Insectos/ultraestructura , Drosophila melanogaster/genética , Genoma de los Insectos , Mamíferos/genética , Animales , Evolución Biológica , Factor de Unión a CCCTC/genética , Factor de Unión a CCCTC/metabolismo , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Cromatina/química , Ensamble y Desensamble de Cromatina , Proteínas Cromosómicas no Histona/genética , Proteínas Cromosómicas no Histona/metabolismo , Mapeo Cromosómico/instrumentación , Cromosomas de Insectos/química , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/ultraestructura , Proteínas del Ojo/genética , Proteínas del Ojo/metabolismo , Expresión Génica , Humanos , Proteínas Asociadas a Microtúbulos/genética , Proteínas Asociadas a Microtúbulos/metabolismo , Conformación Molecular , Proteínas Asociadas a Matriz Nuclear/genética , Proteínas Asociadas a Matriz Nuclear/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Cohesinas
9.
Cold Spring Harb Protoc ; 2018(2)2018 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-28733394

RESUMEN

This protocol describes an optimized high-throughput procedure for generating double deletion mutants in Schizosaccharomyces pombe using the colony replicating robot ROTOR HDA and the PEM (pombe epistasis mapper) system. The method is based on generating high-density colony arrays (1536 colonies per agar plate) and passaging them through a series of antidiploid and mating-type selection (ADS-MTS) and double-mutant selection (DMS) steps. Detailed program parameters for each individual replication step are provided. Using this procedure, batches of 25 or more screens can be routinely performed.


Asunto(s)
Mapeo Cromosómico/instrumentación , Mapeo Cromosómico/métodos , Epistasis Genética , Genes Fúngicos , Genética Microbiana/instrumentación , Genética Microbiana/métodos , Schizosaccharomyces/genética , Eliminación de Gen , Ensayos Analíticos de Alto Rendimiento/instrumentación , Ensayos Analíticos de Alto Rendimiento/métodos , Robótica/instrumentación , Robótica/métodos , Schizosaccharomyces/crecimiento & desarrollo , Selección Genética
10.
Cold Spring Harb Protoc ; 2018(2)2018 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-28733416

RESUMEN

In laboratories in which a colony-replicating robot is not available, manual replication provides a good, low-cost alternative for genetic interaction screening using the Pombe Epistasis Mapper (PEM) system. The protocol presented here describes the minimum number of steps required to identify genetic interactions. First, a query deletion is introduced to a library of deletion mutants by mating. Through a series of subsequent selection steps, single and double mutants are isolated and analyzed.


Asunto(s)
Mapeo Cromosómico/instrumentación , Mapeo Cromosómico/métodos , Epistasis Genética , Genes Fúngicos , Genética Microbiana/instrumentación , Genética Microbiana/métodos , Schizosaccharomyces/genética , Mutación , Selección Genética
11.
Nat Commun ; 8(1): 1826, 2017 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-29184056

RESUMEN

A main challenge in genome-wide association studies (GWAS) is to pinpoint possible causal variants. Results from GWAS typically do not directly translate into causal variants because the majority of hits are in non-coding or intergenic regions, and the presence of linkage disequilibrium leads to effects being statistically spread out across multiple variants. Post-GWAS annotation facilitates the selection of most likely causal variant(s). Multiple resources are available for post-GWAS annotation, yet these can be time consuming and do not provide integrated visual aids for data interpretation. We, therefore, develop FUMA: an integrative web-based platform using information from multiple biological resources to facilitate functional annotation of GWAS results, gene prioritization and interactive visualization. FUMA accommodates positional, expression quantitative trait loci (eQTL) and chromatin interaction mappings, and provides gene-based, pathway and tissue enrichment results. FUMA results directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.


Asunto(s)
Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo/instrumentación , Estudio de Asociación del Genoma Completo/métodos , Desequilibrio de Ligamiento , Cromatina/genética , Mapeo Cromosómico/instrumentación , Mapeo Cromosómico/métodos , Biología Computacional/métodos , Enfermedad de Crohn/genética , Predisposición Genética a la Enfermedad , Genoma Humano , Humanos , Internet , Anotación de Secuencia Molecular/métodos , Sitios de Carácter Cuantitativo
12.
Cell Rep ; 21(1): 289-300, 2017 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-28978481

RESUMEN

Protein-DNA interactions provide the basis for chromatin structure and gene regulation. Comprehensive identification of protein-occupied sites is thus vital to an in-depth understanding of genome function. Dimethyl sulfate (DMS) is a chemical probe that has long been used to detect footprints of DNA-bound proteins in vitro and in vivo. Here, we describe a genomic footprinting method, dimethyl sulfate sequencing (DMS-seq), which exploits the cell-permeable nature of DMS to obviate the need for nuclear isolation. This feature makes DMS-seq simple in practice and removes the potential risk of protein re-localization during nuclear isolation. DMS-seq successfully detects transcription factors bound to cis-regulatory elements and non-canonical chromatin particles in nucleosome-free regions. Furthermore, an unexpected preference of DMS confers on DMS-seq a unique potential to directly detect nucleosome centers without using genetic manipulation. We expect that DMS-seq will serve as a characteristic method for genome-wide interrogation of in vivo protein-DNA interactions.


Asunto(s)
Mapeo Cromosómico/métodos , Huella de ADN/métodos , Proteínas de Unión al ADN/genética , Genoma Humano , Nucleosomas/química , Ésteres del Ácido Sulfúrico/química , Línea Celular , Mapeo Cromosómico/instrumentación , ADN/genética , ADN/metabolismo , Proteínas de Unión al ADN/metabolismo , Regulación de la Expresión Génica , Biblioteca de Genes , Sitios Genéticos , Hepatocitos/citología , Hepatocitos/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Histonas/genética , Histonas/metabolismo , Humanos , Nucleosomas/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Saccharomyces cerevisiae/citología , Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ADN
13.
Nat Biotechnol ; 35(7): 640-646, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28553940

RESUMEN

The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.


Asunto(s)
Mapeo Cromosómico/instrumentación , Código de Barras del ADN Taxonómico/instrumentación , Genoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Dispositivos Laboratorio en un Chip , Análisis de Matrices Tisulares/instrumentación , Separación Celular/instrumentación , Diseño de Equipo , Análisis de Falla de Equipo
14.
Lab Chip ; 17(4): 579-590, 2017 02 14.
Artículo en Inglés | MEDLINE | ID: mdl-28098301

RESUMEN

Optical DNA mapping has over the last decade emerged as a very powerful tool for obtaining long range sequence information from single DNA molecules. In optical DNA mapping, intact large single DNA molecules are labeled, stretched out, and imaged using a fluorescence microscope. This means that sequence information ranging over hundreds of kilobasepairs (kbp) can be obtained in one single image. Nanochannels offer homogeneous and efficient stretching of DNA that is crucial to maximize the information that can be obtained from optical DNA maps. In this review, we highlight progress in the field of optical DNA mapping in nanochannels. We discuss the different protocols for sequence specific labeling and divide them into two main categories, enzymatic labeling and affinity-based labeling. Examples are highlighted where optical DNA mapping is used to gain information on length scales that would be inaccessible with traditional techniques. Enzymatic labeling has been commercialized and is mainly used in human genetics and assembly of complex genomes, while the affinity-based methods have primarily been applied in bacteriology, for example for rapid analysis of plasmids encoding antibiotic resistance. Next, we highlight how the design of nanofluidic channels can been altered in order to obtain the desired information and discuss how recent advances in the field make it possible to retrieve information beyond DNA sequence. In the outlook section, we discuss future directions of optical DNA mapping, such as fully integrated devices and portable microscopes.


Asunto(s)
Mapeo Cromosómico , ADN , Técnicas Analíticas Microfluídicas/instrumentación , Microscopía Fluorescente/instrumentación , Nanotecnología/instrumentación , Animales , Química Encefálica , Línea Celular , Mapeo Cromosómico/instrumentación , Mapeo Cromosómico/métodos , ADN/análisis , ADN/química , ADN/genética , Humanos , Ratones
15.
Adv Exp Med Biol ; 926: 1-10, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27686802

RESUMEN

Proteogenomic strategies aim to refine genome-wide annotations of protein coding features by using actual protein level observations. Most of the currently applied proteogenomic approaches include integrative analysis of multiple types of high-throughput omics data, e.g., genomics, transcriptomics, proteomics, etc. Recent efforts towards creating a human proteome map were primarily targeted to experimentally detect at least one protein product for each gene in the genome and extensively utilized proteogenomic approaches. The 14 year long wait to get a draft human proteome map, after completion of similar efforts to sequence the genome, explains the huge complexity and technical hurdles of such efforts. Further, the integrative analysis of large-scale multi-omics datasets inherent to these studies becomes a major bottleneck to their success. However, recent developments of various analysis tools and pipelines dedicated to proteogenomics reduce both the time and complexity of such analysis. Here, we summarize notable approaches, studies, software developments and their potential applications towards eukaryotic genome annotation and clinical proteogenomics.


Asunto(s)
Mapeo Cromosómico/métodos , Genoma , Sistemas de Lectura Abierta , Proteogenómica/métodos , Programas Informáticos/provisión & distribución , Animales , Mapeo Cromosómico/instrumentación , Conjuntos de Datos como Asunto , Células Eucariotas/metabolismo , Humanos , Anotación de Secuencia Molecular , Proteogenómica/instrumentación , Proteoma
16.
BMC Bioinformatics ; 17(1): 198, 2016 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-27143038

RESUMEN

BACKGROUND: Gene expression connectivity mapping has proven to be a powerful and flexible tool for research. Its application has been shown in a broad range of research topics, most commonly as a means of identifying potential small molecule compounds, which may be further investigated as candidates for repurposing to treat diseases. The public release of voluminous data from the Library of Integrated Cellular Signatures (LINCS) programme further enhanced the utilities and potentials of gene expression connectivity mapping in biomedicine. RESULTS: We describe QUADrATiC ( http://go.qub.ac.uk/QUADrATiC ), a user-friendly tool for the exploration of gene expression connectivity on the subset of the LINCS data set corresponding to FDA-approved small molecule compounds. It enables the identification of compounds for repurposing therapeutic potentials. The software is designed to cope with the increased volume of data over existing tools, by taking advantage of multicore computing architectures to provide a scalable solution, which may be installed and operated on a range of computers, from laptops to servers. This scalability is provided by the use of the modern concurrent programming paradigm provided by the Akka framework. The QUADrATiC Graphical User Interface (GUI) has been developed using advanced Javascript frameworks, providing novel visualization capabilities for further analysis of connections. There is also a web services interface, allowing integration with other programs or scripts. CONCLUSIONS: QUADrATiC has been shown to provide an improvement over existing connectivity map software, in terms of scope (based on the LINCS data set), applicability (using FDA-approved compounds), usability and speed. It offers potential to biological researchers to analyze transcriptional data and generate potential therapeutics for focussed study in the lab. QUADrATiC represents a step change in the process of investigating gene expression connectivity and provides more biologically-relevant results than previous alternative solutions.


Asunto(s)
Mapeo Cromosómico/métodos , Quimioterapia , Mapeo Cromosómico/instrumentación , Expresión Génica , Humanos , Bibliotecas de Moléculas Pequeñas/farmacología , Programas Informáticos , Estados Unidos , United States Food and Drug Administration , Interfaz Usuario-Computador
17.
Artículo en Inglés | MEDLINE | ID: mdl-26451812

RESUMEN

While the sequencing capability of modern instruments continues to increase exponentially, the computational problem of mapping short sequenced reads to a reference genome still constitutes a bottleneck in the analysis pipeline. A variety of mapping tools (e.g., Bowtie, BWA) is available for general-purpose computer architectures. These tools can take many hours or even days to deliver mapping results, depending on the number of input reads, the size of the reference genome and the number of allowed mismatches or insertion/deletions, making the mapping problem an ideal candidate for hardware acceleration. In this paper, we present FHAST (FPGA hardware accelerated sequence-matching tool), a drop-in replacement for Bowtie that uses a hardware design based on field programmable gate arrays (FPGA). Our architecture masks memory latency by executing multiple concurrent hardware threads accessing memory simultaneously. FHAST is composed by multiple parallel engines to exploit the parallelism available to us on an FPGA. We have implemented and tested FHAST on the Convey HC-1 and later ported on the Convey HC-2ex, taking advantage of the large memory bandwidth available to these systems and the shared memory image between hardware and software. A preliminary version of FHAST running on the Convey HC-1 achieved up to 70x speedup compared to Bowtie (single-threaded). An improved version of FHAST running on the Convey HC-2ex FPGAs achieved up to 12x fold speed gain compared to Bowtie running eight threads on an eight-core conventional architecture, while maintaining almost identical mapping accuracy. FHAST is a drop-in replacement for Bowtie, so it can be incorporated in any analysis pipeline that uses Bowtie (e.g., TopHat).


Asunto(s)
Mapeo Cromosómico/instrumentación , ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Análisis de Secuencia de ADN/instrumentación , Procesamiento de Señales Asistido por Computador/instrumentación , Programas Informáticos , Mapeo Cromosómico/métodos , Diseño de Equipo , Análisis de Falla de Equipo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
18.
Artículo en Inglés | MEDLINE | ID: mdl-26451814

RESUMEN

We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.


Asunto(s)
Mapeo Cromosómico/instrumentación , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , ARN/genética , Análisis de Secuencia de ARN/instrumentación , Procesamiento de Señales Asistido por Computador/instrumentación , Programas Informáticos , Secuencia de Bases , Mapeo Cromosómico/métodos , Diseño de Equipo , Análisis de Falla de Equipo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Datos de Secuencia Molecular , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Alineación de Secuencia/instrumentación , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos
19.
Artículo en Inglés | MEDLINE | ID: mdl-26451813

RESUMEN

High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively long runtimes; e.g., processing a moderately-sized dataset consisting of about 500,000 SNPs and 5,000 samples requires several days using state-of-the-art tools on a standard 3 GHz CPU. In this paper, we demonstrate how this task can be accelerated using a combination of fine-grained and coarse-grained parallelism on two different computing systems. The first architecture is based on reconfigurable hardware (FPGAs) while the second architecture uses multiple GPUs connected to the same host. We show that both systems can achieve speedups of around four orders-of-magnitude compared to the sequential implementation. This significantly reduces the runtimes for detecting epistasis to only a few minutes for moderately-sized datasets and to a few hours for large-scale datasets.


Asunto(s)
Gráficos por Computador/instrumentación , Análisis Mutacional de ADN/instrumentación , Epistasis Genética/genética , Estudio de Asociación del Genoma Completo/instrumentación , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Polimorfismo de Nucleótido Simple/genética , Mapeo Cromosómico/instrumentación , Mapeo Cromosómico/métodos , Diseño de Equipo , Análisis de Falla de Equipo , Estudio de Asociación del Genoma Completo/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Procesamiento de Señales Asistido por Computador/instrumentación
20.
Artículo en Inglés | MEDLINE | ID: mdl-26451815

RESUMEN

Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.


Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Mapeo Cromosómico/instrumentación , Ensayos Analíticos de Alto Rendimiento/instrumentación , Análisis de Secuencia por Matrices de Oligonucleótidos/instrumentación , Mapeo de Interacción de Proteínas/instrumentación , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Ensayos Analíticos de Alto Rendimiento/métodos , Transducción de Señal/fisiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA