Búsqueda | Portal Regional de la BVS

1.

BOA: A partitioned view of genome assembly.

An, Xiaojing; Ghosh, Priyanka; Keppler, Patrick; Kurt, Sureyya Emre; Krishnamoorthy, Sriram; Sadayappan, Ponnuswamy; Rajam, Aravind Sukumaran; Çatalyürek, Ümit V; Kalyanaraman, Ananth.

iScience ; 25(11): 105273, 2022 Nov 18.

Artículo en Inglés | MEDLINE | ID: mdl-36304115

RESUMEN

De novo genome assembly is a fundamental problem in computational molecular biology that aims to reconstruct an unknown genome sequence from a set of short DNA sequences (or reads) obtained from the genome. The relative ordering of the reads along the target genome is not known a priori, which is one of the main contributors to the increased complexity of the assembly process. In this article, with the dual objective of improving assembly quality and exposing a high degree of parallelism, we present a partitioning-based approach. Our framework, BOA (bucket-order-assemble), uses a bucketing alongside graph- and hypergraph-based partitioning techniques to produce a partial ordering of the reads. This partial ordering enables us to divide the read set into disjoint blocks that can be independently assembled in parallel using any state-of-the-art serial assembler of choice. Experimental results show that BOA improves both the overall assembly quality and performance.

2.

StrainHub: a phylogenetic tool to construct pathogen transmission networks.

de Bernardi Schneider, Adriano; Ford, Colby T; Hostager, Reilly; Williams, John; Cioce, Michael; Çatalyürek, Ümit V; Wertheim, Joel O; Janies, Daniel.

Bioinformatics ; 36(3): 945-947, 2020 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-31418766

RESUMEN

SUMMARY: In exploring the epidemiology of infectious diseases, networks have been used to reconstruct contacts among individuals and/or populations. Summarizing networks using pathogen metadata (e.g. host species and place of isolation) and a phylogenetic tree is a nascent, alternative approach. In this paper, we introduce a tool for reconstructing transmission networks in arbitrary space from phylogenetic information and metadata. Our goals are to provide a means of deriving new insights and infection control strategies based on the dynamics of the pathogen lineages derived from networks and centrality metrics. We created a web-based application, called StrainHub, in which a user can input a phylogenetic tree based on genetic or other data along with characters derived from metadata using their preferred tree search method. StrainHub generates a transmission network based on character state changes in metadata, such as place or source of isolation, mapped on the phylogenetic tree. The user has the option to calculate centrality metrics on the nodes including betweenness, closeness, degree and a new metric, the source/hub ratio. The outputs include the network with values for metrics on its nodes and the tree with characters reconstructed. All of these results can be exported for further analysis. AVAILABILITY AND IMPLEMENTATION: strainhub.io and https://github.com/abschneider/StrainHub.

Asunto(s)

Metadatos , Humanos , Filogenia

3.

The transcriptome of the rumen ciliate Entodinium caudatum reveals some of its metabolic features.

Wang, Lingling; Abu-Doleh, Anas; Plank, Johanna; Catalyurek, Umit V; Firkins, Jeffrey L; Yu, Zhongtang.

BMC Genomics ; 20(1): 1008, 2019 Dec 21.

Artículo en Inglés | MEDLINE | ID: mdl-31864285

RESUMEN

BACKGROUND: Rumen ciliates play important roles in rumen function by digesting and fermenting feed and shaping the rumen microbiome. However, they remain poorly understood due to the lack of definitive direct evidence without influence by prokaryotes (including symbionts) in co-cultures or the rumen. In this study, we used RNA-Seq to characterize the transcriptome of Entodinium caudatum, the most predominant and representative rumen ciliate species. RESULTS: Of a large number of transcripts, > 12,000 were annotated to the curated genes in the NR, UniProt, and GO databases. Numerous CAZymes (including lysozyme and chitinase) and peptidases were represented in the transcriptome. This study revealed the ability of E. caudatum to depolymerize starch, hemicellulose, pectin, and the polysaccharides of the bacterial and fungal cell wall, and to degrade proteins. Many signaling pathways, including the ones that have been shown to function in E. caudatum, were represented by many transcripts. The transcriptome also revealed the expression of the genes involved in symbiosis, detoxification of reactive oxygen species, and the electron-transport chain. Overall, the transcriptomic evidence is consistent with some of the previous premises about E. caudatum. However, the identification of specific genes, such as those encoding lysozyme, peptidases, and other enzymes unique to rumen ciliates might be targeted to develop specific and effective inhibitors to improve nitrogen utilization efficiency by controlling the activity and growth of rumen ciliates. The transcriptomic data will also help the assembly and annotation in future genomic sequencing of E. caudatum. CONCLUSION: As the first transcriptome of a single species of rumen ciliates ever sequenced, it provides direct evidence for the substrate spectrum, fermentation pathways, ability to respond to various biotic and abiotic stimuli, and other physiological and ecological features of E. caudatum. The presence and expression of the genes involved in the lysis and degradation of microbial cells highlight the dependence of E. caudatum on engulfment of other rumen microbes for its survival and growth. These genes may be explored in future research to develop targeted control of Entodinium species in the rumen. The transcriptome can also facilitate future genomic studies of E. caudatum and other related rumen ciliates.

Asunto(s)

Alveolados/genética , Alveolados/metabolismo , Perfilación de la Expresión Génica , Alveolados/citología , Alveolados/fisiología , Animales , Metabolismo de los Hidratos de Carbono/genética , Espacio Intracelular/metabolismo , Fagocitosis/genética , ARN Mensajero/genética , RNA-Seq , Transducción de Señal/genética , Simbiosis/genética

4.

Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.

Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdag, Doruk; Kaya, Kamer; Çatalyürek, Ümit V.

Methods Mol Biol ; 1375: 55-74, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-26626937

RESUMEN

Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.

Asunto(s)

Análisis por Conglomerados , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Algoritmos , Bases de Datos Genéticas , Regulación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Genes BRCA1 , Genes BRCA2 , Genes p53 , Humanos

5.

Tracing Origins of the Salmonella Bareilly Strain Causing a Food-borne Outbreak in the United States.

Hoffmann, Maria; Luo, Yan; Monday, Steven R; Gonzalez-Escalona, Narjol; Ottesen, Andrea R; Muruvanda, Tim; Wang, Charles; Kastanis, George; Keys, Christine; Janies, Daniel; Senturk, Izzet F; Catalyurek, Umit V; Wang, Hua; Hammack, Thomas S; Wolfgang, William J; Schoonmaker-Bopp, Dianna; Chu, Alvina; Myers, Robert; Haendiges, Julie; Evans, Peter S; Meng, Jianghong; Strain, Errol A; Allard, Marc W; Brown, Eric W.

J Infect Dis ; 213(4): 502-8, 2016 Feb 15.

Artículo en Inglés | MEDLINE | ID: mdl-25995194

RESUMEN

BACKGROUND: Using a novel combination of whole-genome sequencing (WGS) analysis and geographic metadata, we traced the origins of Salmonella Bareilly isolates collected in 2012 during a widespread food-borne outbreak in the United States associated with scraped tuna imported from India. METHODS: Using next-generation sequencing, we sequenced the complete genome of 100 Salmonella Bareilly isolates obtained from patients who consumed contaminated product, from natural sources, and from unrelated historically and geographically disparate foods. Pathogen genomes were linked to geography by projecting the phylogeny on a virtual globe and produced a transmission network. RESULTS: Phylogenetic analysis of WGS data revealed a common origin for outbreak strains, indicating that patients in Maryland and New York were infected from sources originating at a facility in India. CONCLUSIONS: These data represent the first report fully integrating WGS analysis with geographic mapping and a novel use of transmission networks. Results showed that WGS vastly improves our ability to delimit the scope and source of bacterial food-borne contamination events. Furthermore, these findings reinforce the extraordinary utility that WGS brings to global outbreak investigation as a greatly enhanced approach to protecting the human food supply chain as well as public health in general.

Asunto(s)

Brotes de Enfermedades , Enfermedades Transmitidas por los Alimentos/epidemiología , Infecciones por Salmonella/epidemiología , Salmonella enterica/clasificación , Salmonella enterica/aislamiento & purificación , Animales , Enfermedades Transmitidas por los Alimentos/microbiología , Genoma Bacteriano , Genotipo , Humanos , India , Epidemiología Molecular , Tipificación Molecular , Filogeografía , Infecciones por Salmonella/microbiología , Salmonella enterica/genética , Análisis de Secuencia de ADN , Atún/microbiología , Estados Unidos/epidemiología

6.

Mitigating bias in planning two-colour microarray experiments.

Ferhatosmanoglu, Nilgun; Allen, Theodore T; Catalyurek, Umit V.

Int J Data Min Bioinform ; 13(1): 31-49, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26529906

RESUMEN

Two-colour microarrays are used to study differential gene expression on a large scale. Experimental planning can help reduce the chances of wrong inferences about whether genes are differentially expressed. Previous research on this problem has focused on minimising estimation errors (according to variance-based criteria such as A-optimality) on the basis of optimistic assumptions about the system studied. In this paper, we propose a novel planning criterion to evaluate existing plans for microarray experiments. The proposed criterion is 'Generalised-A Optimality' that is based on realistic assumptions that include bias errors. Using Generalised-A Optimality, the reference-design approach is likely to yield greater estimation accuracy in specific situations in which loop designs had previously seemed superior. However, hybrid designs are likely to offer higher estimation accuracy than reference, loop and interwoven designs having the same number of samples and slides. These findings are supported by data from both simulated and real microarray experiments.

Asunto(s)

Minería de Datos/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Modelos Teóricos , Análisis de Secuencia por Matrices de Oligonucleótidos

7.

Allele-specific imbalance mapping at human orthologs of mouse susceptibility to colon cancer (Scc) loci.

Gerber, Madelyn M; Hampel, Heather; Zhou, Xiao-Ping; Schulz, Nathan P; Suhy, Adam; Deveci, Mehmet; Çatalyürek, Ümit V; Ewart Toland, Amanda.

Int J Cancer ; 137(10): 2323-31, 2015 Nov 15.

Artículo en Inglés | MEDLINE | ID: mdl-25973956

RESUMEN

Colorectal cancer (CRC) can be classified into different types. Chromosomal instable (CIN) colon cancers are thought to be the most common type of colon cancer. The risk of developing a CIN-related CRC is due in part to inherited risk factors. Genome-wide association studies have yielded over 40 single nucleotide polymorphisms (SNPs) associated with CRC risk, but these only account for a subset of risk alleles. Some of this missing heritability may be due to gene-gene interactions. We developed a strategy to identify interacting candidate genes/loci for CRC risk that utilizes both linkage and RNA-seq data from mouse models in combination with allele-specific imbalance (ASI) studies in human tumors. We applied our strategy to three previously identified CRC susceptibility loci in the mouse that show evidence of genetic interaction: Scc4, Scc5 and Scc13. 525 SNPs from genes showing differential expression in the mouse and/or a previous role in cancer from the literature were evaluated for allele-specific imbalance in 194 paired human normal/tumor DNAs from CIN-related CRCs. One hundred three SNPs showing suggestive evidence of ASI (31 variants with uncorrected p values < 0.05) were genotyped in a validation set of 296 paired DNAs. Two variants in SNX10 (SCC13) showed significant evidence of allelic selection after multiple comparisons testing. Future studies will evaluate the role of these variants in combination with interacting genetic partners in colon cancer risk in mouse and humans.

Asunto(s)

Desequilibrio Alélico , Neoplasias del Colon/genética , Predisposición Genética a la Enfermedad/genética , Neoplasias Experimentales/genética , Alelos , Animales , Inestabilidad Cromosómica/genética , Hibridación Genómica Comparativa , Femenino , Genotipo , Humanos , Desequilibrio de Ligamiento , Ratones , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ARN/métodos

8.

A Novel Multiple Choice Question Generation Strategy: Alternative Uses for Controlled Vocabulary Thesauri in Biomedical-Sciences Education.

Lopetegui, Marcelo A; Lara, Barbara A; Yen, Po-Yin; Çatalyürek, Ümit V; Payne, Philip R O.

AMIA Annu Symp Proc ; 2015: 861-9, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26958222

RESUMEN

Multiple choice questions play an important role in training and evaluating biomedical science students. However, the resource intensive nature of question generation limits their open availability, reducing their contribution to evaluation purposes mainly. Although applied-knowledge questions require a complex formulation process, the creation of concrete-knowledge questions (i.e., definitions, associations) could be assisted by the use of informatics methods. We envisioned a novel and simple algorithm that exploits validated knowledge repositories and generates concrete-knowledge questions by leveraging concepts' relationships. In this manuscript we present the development and validation of a prototype which successfully produced meaningful concrete-knowledge questions, opening new applications for existing knowledge repositories, potentially benefiting students of all biomedical sciences disciplines.

Asunto(s)

Algoritmos , Disciplinas de las Ciencias Biológicas/educación , Educación Médica , Evaluación Educacional/métodos , Vocabulario Controlado , Conducta de Elección , Humanos , Medical Subject Headings

9.

Phylogenetic visualization of the spread of H7 influenza A viruses.

Janies, Daniel A; Pomeroy, Laura W; Krueger, Chris; Zhang, Yuqi; Senturk, Izzet F; Kaya, Kamer; Çatalyürek, Ümit V.

Cladistics ; 31(6): 679-691, 2015 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-34753271

RESUMEN

Viruses of influenza A subtype H7 can be highly pathogenic and periodically infect humans. For example, there have been numerous outbreaks of H7 in the Americas and Europe since 1996. More recently, a reassortant H7N9 has emerged among humans and birds during 2013-2014 in China, Taiwan and Hong Kong. This H7N9 genome consists of genetic segments that assort with H7 and H9 viruses previously circulating in chickens and wild birds in China and ducks in Korea. Epidemic risk modellers have used agricultural, climatic and demographic data to predict that the virus will spread to northern Vietnam via poultry. To shed light on the traffic of H7 viruses in general, we examine genetic segments of influenza that have assorted with many strains of H7 viruses dating back to 1902. We focus on use cases from the United States, Italy and China. We apply a novel metric, betweenness, an associated phylogenetic visualization technique, transmission networks, and compare these with another technique, route mapping. In contrast to traditional views, our results illustrate that segments that assort with H7 viruses are spread frequently between the Americas and Eurasia. In summary, genetic segments that historically assort with H7 influenza viruses have been spread from China to: Australia, Czech Republic, Denmark, Egypt, Germany, Hong Kong, Italy, Japan, Mongolia, the Netherlands, New Zealand, Pakistan, South Africa, South Korea, Spain, Sweden, the UK, the US, and Vietnam.

10.

mrSNP: software to detect SNP effects on microRNA binding.

Deveci, Mehmet; Catalyürek, Umit V; Toland, Amanda Ewart.

BMC Bioinformatics ; 15: 73, 2014 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-24629096

RESUMEN

BACKGROUND: MicroRNAs (miRNAs) are short (19-23 nucleotides) non-coding RNAs that bind to sites in the 3'untranslated regions (3'UTR) of a targeted messenger RNA (mRNA). Binding leads to degradation of the transcript or blocked translation resulting in decreased expression of the targeted gene. Single nucleotide polymorphisms (SNPs) have been found in 3'UTRs that disrupt normal miRNA binding or introduce new binding sites and some of these have been associated with disease pathogenesis. This raises the importance of detecting miRNA targets and predicting the possible effects of SNPs on binding sites. In the last decade a number of studies have been conducted to predict the location of miRNA binding sites. However, there have been fewer algorithms published to analyze the effects of SNPs on miRNA binding. Moreover, the existing software has some shortcomings including the requirement for significant manual labor when working with huge lists of SNPs and that algorithms work only for SNPs present in databases such as dbSNP. These limitations become problematic as next-generation sequencing is leading to large numbers of novel variants in 3'UTRs. RESULT: In order to overcome these issues, we developed a web-server named mrSNP which predicts the impact of a SNP in a 3'UTR on miRNA binding. The proposed tool reduces the manual labor requirements and allows users to input any SNP that has been identified by any SNP-calling program. In testing the performance of mrSNP on SNPs experimentally validated to affect miRNA binding, mrSNP correctly identified 69% (11/16) of the SNPs disrupting binding. CONCLUSIONS: mrSNP is a highly adaptable and performing tool for predicting the effect a 3'UTR SNP will have on miRNA binding. This tool has advantages over existing algorithms because it can assess the effect of novel SNPs on miRNA binding without requiring significant hands on time.

Asunto(s)

MicroARNs/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Regiones no Traducidas 3' , Algoritmos , Sitios de Unión/genética , Humanos , MicroARNs/metabolismo , Polimorfismo de Nucleótido Simple , ARN Mensajero/genética , ARN Mensajero/metabolismo

11.

NUSAP1 influences the DNA damage response by controlling BRCA1 protein levels.

Kotian, Shweta; Banerjee, Tapahsama; Lockhart, Ainsley; Huang, Kun; Catalyurek, Umit V; Parvin, Jeffrey D.

Cancer Biol Ther ; 15(5): 533-43, 2014 May.

Artículo en Inglés | MEDLINE | ID: mdl-24521615

RESUMEN

NUSAP1 has been reported to function in mitotic spindle assembly, chromosome segregation, and regulation of cytokinesis. In this study, we find that NUSAP1 has hitherto unknown functions in the key BRCA1-regulated pathways of double strand DNA break repair and centrosome duplication. Both these pathways are important for maintenance of genomic stability, and any defects in these pathways can cause tumorigenesis. Depletion of NUSAP1 from cells led to the suppression of double strand DNA break repair via the homologous recombination and single-strand annealing pathways. The presence of NUSAP1 was also found to be important for the control of centrosome numbers. We have found evidence that NUSAP1 plays a role in these processes through regulation of BRCA1 protein levels, and BRCA1 overexpression from a plasmid mitigates the defective phenotypes seen upon NUSAP1 depletion. We found that after NUSAP1 depletion there is a decrease in BRCA1 recruitment to ionizing radiation-induced foci. Results from this study reveal a novel association between BRCA1 and NUSAP1 and suggests a mechanism whereby NUSAP1 is involved in carcinogenesis.

Asunto(s)

Proteína BRCA1/metabolismo , Daño del ADN , Reparación del ADN , Proteínas Asociadas a Microtúbulos/metabolismo , Proteína BRCA1/genética , Línea Celular Tumoral , Centrosoma/metabolismo , Daño del ADN/efectos de la radiación , ADN de Cadena Simple/metabolismo , Puntos de Control de la Fase G2 del Ciclo Celular , Recombinación Homóloga , Humanos , Proteínas Asociadas a Microtúbulos/genética , Puntos de Control de la Fase S del Ciclo Celular

12.

Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2.

Bokhari, Shahid H; Çatalyürek, Ümit V; Gurcan, Metin N.

Concurr Comput ; 26(18): 2836-2855, 2014 Dec 01.

Artículo en Inglés | MEDLINE | ID: mdl-25598745

RESUMEN

Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature.

13.

Metagenomic insights into the carbohydrate-active enzymes carried by the microorganisms adhering to solid digesta in the rumen of cows.

Wang, Lingling; Hatem, Ayat; Catalyurek, Umit V; Morrison, Mark; Yu, Zhongtang.

PLoS One ; 8(11): e78507, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24223817

RESUMEN

The ruminal microbial community is a unique source of enzymes that underpin the conversion of cellulosic biomass. In this study, the microbial consortia adherent on solid digesta in the rumen of Jersey cattle were subjected to an activity-based metagenomic study to explore the genetic diversity of carbohydrolytic enzymes in Jersey cows, with a particular focus on cellulases and xylanases. Pyrosequencing and bioinformatic analyses of 120 carbohydrate-active fosmids identified genes encoding 575 putative Carbohydrate-Active Enzymes (CAZymes) and proteins putatively related to transcriptional regulation, transporters, and signal transduction coupled with polysaccharide degradation and metabolism. Most of these genes shared little similarity to sequences archived in databases. Genes that were predicted to encode glycoside hydrolases (GH) involved in xylan and cellulose hydrolysis (e.g., GH3, 5, 9, 10, 39 and 43) were well represented. A new subfamily (S-8) of GH5 was identified from contigs assigned to Firmicutes. These subfamilies of GH5 proteins also showed significant phylum-dependent distribution. A number of polysaccharide utilization loci (PULs) were found, and two of them contained genes encoding Sus-like proteins and cellulases that have not been reported in previous metagenomic studies of samples from the rumens of cows or other herbivores. Comparison with the large metagenomic datasets previously reported of other ruminant species (or cattle breeds) and wallabies showed that the rumen microbiome of Jersey cows might contain differing CAZymes. Future studies are needed to further explore how host genetics and diets affect the diversity and distribution of CAZymes and utilization of plant cell wall materials.

Asunto(s)

Proteínas Bacterianas/genética , Celulasas/genética , Celulosa/metabolismo , Endo-1,4-beta Xilanasas/genética , Glicósido Hidrolasas/genética , Metagenoma , Xilanos/metabolismo , Animales , Proteínas Bacterianas/clasificación , Proteínas Bacterianas/metabolismo , Bovinos , Celulasas/clasificación , Celulasas/metabolismo , Digestión/fisiología , Endo-1,4-beta Xilanasas/clasificación , Endo-1,4-beta Xilanasas/metabolismo , Glicósido Hidrolasas/clasificación , Glicósido Hidrolasas/metabolismo , Consorcios Microbianos/genética , Anotación de Secuencia Molecular , Filogenia , Rumen/enzimología , Rumen/microbiología , Rumiantes/microbiología , Rumiantes/fisiología

14.

Benchmarking short sequence mapping tools.

Hatem, Ayat; Bozdag, Doruk; Toland, Amanda E; Çatalyürek, Ümit V.

BMC Bioinformatics ; 14: 184, 2013 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-23758764

RESUMEN

BACKGROUND: The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. RESULTS: We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. CONCLUSION: The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Benchmarking , Genoma

15.

A comparative analysis of biclustering algorithms for gene expression data.

Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V.

Brief Bioinform ; 14(3): 279-92, 2013 May.

Artículo en Inglés | MEDLINE | ID: mdl-22772837

RESUMEN

The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.

Asunto(s)

Algoritmos , Expresión Génica , Animales , Análisis por Conglomerados , Análisis Factorial , Humanos , Modelos Teóricos , Análisis de Secuencia por Matrices de Oligonucleótidos

16.

Computer-aided prognosis of neuroblastoma: detection of mitosis and karyorrhexis cells in digitized histological images.

Sertel, Olcay; Catalyurek, Umit V; Shimada, Hiroyuki; Gurcan, Metin N.

Annu Int Conf IEEE Eng Med Biol Soc ; 2009: 1433-6, 2009.

Artículo en Inglés | MEDLINE | ID: mdl-19963746

RESUMEN

Histopathological examination is one of the most important steps in evaluating prognosis of patients with neuroblastoma (NB). NB is a pediatric tumor of sympathetic nervous system and current evaluation of NB tumor histology is done according to the International Neuroblastoma Pathology Classification. The number of cells undergoing either mitosis or karyorrhexis (MK) plays an important role in this classification system. However, manual counting of such cells is tedious and subject to considerable inter- and intra-reader variations. A computer-assisted system may allow more precise results leading to more accurate prognosis in clinical practice. In this study, we propose an image analysis approach that operates on digitized NB histology samples. Based on the likelihood functions estimated from the samples of manually marked regions, we compute the probability map that indicates how likely a pixel belongs to an MK cell. Component-wise 2-step thresholding of the generated probability map provides promising results in detecting MK cells with an average sensitivity of 81.1% and 12.2 false positive detections on average.

Asunto(s)

Diagnóstico por Computador/métodos , Neuroblastoma/patología , Muerte Celular , Niño , Diagnóstico por Computador/estadística & datos numéricos , Humanos , Procesamiento de Imagen Asistido por Computador , Funciones de Verosimilitud , Mitosis , Pronóstico , Procesamiento de Señales Asistido por Computador

17.

A knowledge-anchored integrative image search and retrieval system.

Erdal, Selnur; Catalyurek, Umit V; Payne, Philip R O; Saltz, Joel; Kamal, Jyoti; Gurcan, Metin N.

J Digit Imaging ; 22(2): 166-82, 2009 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-18040742

RESUMEN

Clinical data that may be used in a secondary capacity to support research activities are regularly stored in three significantly different formats: (1) structured, codified data elements; (2) semi-structured or unstructured narrative text; and (3) multi-modal images. In this manuscript, we will describe the design of a computational system that is intended to support the ontology-anchored query and integration of such data types from multiple source systems. Additional features of the described system include (1) the use of Grid services-based electronic data interchange models to enable the use of our system in multi-site settings and (2) the use of a software framework intended to address both potential security and patient confidentiality concerns that arise when transmitting or otherwise manipulating potentially privileged personal health information. We will frame our discussion within the specific experimental context of the concept-oriented query and integration of correlated structured data, narrative text, and images for cancer research.

Asunto(s)

Almacenamiento y Recuperación de la Información/métodos , Sistemas de Registros Médicos Computarizados , Sistemas de Información Radiológica , Integración de Sistemas , Interfaz Usuario-Computador , Bases de Datos Factuales , Humanos , Programas Informáticos

18.

Large-scale biomedical image analysis in grid environments.

Kumar, Vijay S; Rutt, Benjamin; Kurc, Tahsin; Catalyurek, Umit V; Pan, Tony C; Chow, Sunny; Lamont, Stephan; Martone, Maryann; Saltz, Joel H.

IEEE Trans Inf Technol Biomed ; 12(2): 154-61, 2008 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-18348945

RESUMEN

This paper presents the application of a component-based Grid middleware system for processing extremely large images obtained from digital microscopy devices. We have developed parallel, out-of-core techniques for different classes of data processing operations employed on images from confocal microscopy scanners. These techniques are combined into a data preprocessing and analysis pipeline using the component-based middleware system. The experimental results show that: 1) our implementation achieves good performance and can handle very large datasets on high-performance Grid nodes, consisting of computation and/or storage clusters and 2) it can take advantage of Grid nodes connected over high-bandwidth wide-area networks by combining task and data parallelism.

Asunto(s)

Sistemas de Administración de Bases de Datos , Interpretación de Imagen Asistida por Computador/métodos , Almacenamiento y Recuperación de la Información/métodos , Internet , Microscopía/métodos , Sistemas de Información Radiológica , Procesamiento de Señales Asistido por Computador , Difusión de la Información/métodos

19.

An XML-based system for synthesis of data from disparate databases.

Kurc, Tahsin; Janies, Daniel A; Johnson, Andrew D; Langella, Stephen; Oster, Scott; Hastings, Shannon; Habib, Farhat; Camerlengo, Terry; Ervin, David; Catalyurek, Umit V; Saltz, Joel H.

J Am Med Inform Assoc ; 13(3): 289-301, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-16501185

RESUMEN

Diverse data sets have become key building blocks of translational biomedical research. Data types captured and referenced by sophisticated research studies include high throughput genomic and proteomic data, laboratory data, data from imagery, and outcome data. In this paper, the authors present the application of an XML-based data management system to support integration of data from disparate data sources and large data sets. This system facilitates management of XML schemas and on-demand creation and management of XML databases that conform to these schemas. They illustrate the use of this system in an application for genotype-phenotype correlation analyses. This application implements a method of phenotype-genotype correlation based on phylogenetic optimization of large data sets of mouse SNPs and phenotypic data. The application workflow requires the management and integration of genomic information and phenotypic data from external data repositories and from the results of phenotype-genotype correlation analyses. Our implementation supports the process of carrying out a complex workflow that includes large-scale phylogenetic tree optimizations and application of Maddison's concentrated changes test to large phylogenetic tree data sets. The data management system also allows collaborators to share data in a uniform way and supports complex queries that target data sets.

Asunto(s)

Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Genotipo , Almacenamiento y Recuperación de la Información , Fenotipo , Animales , Enfermedad Coronaria/genética , Femenino , Haplotipos , Humanos , Filogenia , Polimorfismo de Nucleótido Simple , Lenguajes de Programación , Integración de Sistemas

20.

A grid-based image archival and analysis system.

Hastings, Shannon; Oster, Scott; Langella, Stephen; Kurc, Tahsin M; Pan, Tony; Catalyurek, Umit V; Saltz, Joel H.

J Am Med Inform Assoc ; 12(3): 286-95, 2005.

Artículo en Inglés | MEDLINE | ID: mdl-15684129

RESUMEN

Here the authors present a Grid-aware middleware system, called GridPACS, that enables management and analysis of images in a massive scale, leveraging distributed software components coupled with interconnected computation and storage platforms. The need for this infrastructure is driven by the increasing biomedical role played by complex datasets obtained through a variety of imaging modalities. The GridPACS architecture is designed to support a wide range of biomedical applications encountered in basic and clinical research, which make use of large collections of images. Imaging data yield a wealth of metabolic and anatomic information from macroscopic (e.g., radiology) to microscopic (e.g., digitized slides) scale. Whereas this information can significantly improve understanding of disease pathophysiology as well as the noninvasive diagnosis of disease in patients, the need to process, analyze, and store large amounts of image data presents a great challenge.

Asunto(s)

Diagnóstico por Imagen , Almacenamiento y Recuperación de la Información/métodos , Sistemas de Información Radiológica , Programas Informáticos , Redes de Comunicación de Computadores , Bases de Datos como Asunto , Interfaz Usuario-Computador

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA