Pesquisa | Biblioteca Virtual em Saúde

1.

StrainHub: a phylogenetic tool to construct pathogen transmission networks.

de Bernardi Schneider, Adriano; Ford, Colby T; Hostager, Reilly; Williams, John; Cioce, Michael; Çatalyürek, Ümit V; Wertheim, Joel O; Janies, Daniel.

Bioinformatics ; 36(3): 945-947, 2020 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-31418766

RESUMO

SUMMARY: In exploring the epidemiology of infectious diseases, networks have been used to reconstruct contacts among individuals and/or populations. Summarizing networks using pathogen metadata (e.g. host species and place of isolation) and a phylogenetic tree is a nascent, alternative approach. In this paper, we introduce a tool for reconstructing transmission networks in arbitrary space from phylogenetic information and metadata. Our goals are to provide a means of deriving new insights and infection control strategies based on the dynamics of the pathogen lineages derived from networks and centrality metrics. We created a web-based application, called StrainHub, in which a user can input a phylogenetic tree based on genetic or other data along with characters derived from metadata using their preferred tree search method. StrainHub generates a transmission network based on character state changes in metadata, such as place or source of isolation, mapped on the phylogenetic tree. The user has the option to calculate centrality metrics on the nodes including betweenness, closeness, degree and a new metric, the source/hub ratio. The outputs include the network with values for metrics on its nodes and the tree with characters reconstructed. All of these results can be exported for further analysis. AVAILABILITY AND IMPLEMENTATION: strainhub.io and https://github.com/abschneider/StrainHub.

Assuntos

Metadados , Humanos , Filogenia

2.

The transcriptome of the rumen ciliate Entodinium caudatum reveals some of its metabolic features.

Wang, Lingling; Abu-Doleh, Anas; Plank, Johanna; Catalyurek, Umit V; Firkins, Jeffrey L; Yu, Zhongtang.

BMC Genomics ; 20(1): 1008, 2019 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-31864285

RESUMO

BACKGROUND: Rumen ciliates play important roles in rumen function by digesting and fermenting feed and shaping the rumen microbiome. However, they remain poorly understood due to the lack of definitive direct evidence without influence by prokaryotes (including symbionts) in co-cultures or the rumen. In this study, we used RNA-Seq to characterize the transcriptome of Entodinium caudatum, the most predominant and representative rumen ciliate species. RESULTS: Of a large number of transcripts, > 12,000 were annotated to the curated genes in the NR, UniProt, and GO databases. Numerous CAZymes (including lysozyme and chitinase) and peptidases were represented in the transcriptome. This study revealed the ability of E. caudatum to depolymerize starch, hemicellulose, pectin, and the polysaccharides of the bacterial and fungal cell wall, and to degrade proteins. Many signaling pathways, including the ones that have been shown to function in E. caudatum, were represented by many transcripts. The transcriptome also revealed the expression of the genes involved in symbiosis, detoxification of reactive oxygen species, and the electron-transport chain. Overall, the transcriptomic evidence is consistent with some of the previous premises about E. caudatum. However, the identification of specific genes, such as those encoding lysozyme, peptidases, and other enzymes unique to rumen ciliates might be targeted to develop specific and effective inhibitors to improve nitrogen utilization efficiency by controlling the activity and growth of rumen ciliates. The transcriptomic data will also help the assembly and annotation in future genomic sequencing of E. caudatum. CONCLUSION: As the first transcriptome of a single species of rumen ciliates ever sequenced, it provides direct evidence for the substrate spectrum, fermentation pathways, ability to respond to various biotic and abiotic stimuli, and other physiological and ecological features of E. caudatum. The presence and expression of the genes involved in the lysis and degradation of microbial cells highlight the dependence of E. caudatum on engulfment of other rumen microbes for its survival and growth. These genes may be explored in future research to develop targeted control of Entodinium species in the rumen. The transcriptome can also facilitate future genomic studies of E. caudatum and other related rumen ciliates.

Assuntos

Alveolados/genética , Alveolados/metabolismo , Perfilação da Expressão Gênica , Alveolados/citologia , Alveolados/fisiologia , Animais , Metabolismo dos Carboidratos/genética , Espaço Intracelular/metabolismo , Fagocitose/genética , RNA Mensageiro/genética , RNA-Seq , Transdução de Sinais/genética , Simbiose/genética

3.

Tracing Origins of the Salmonella Bareilly Strain Causing a Food-borne Outbreak in the United States.

Hoffmann, Maria; Luo, Yan; Monday, Steven R; Gonzalez-Escalona, Narjol; Ottesen, Andrea R; Muruvanda, Tim; Wang, Charles; Kastanis, George; Keys, Christine; Janies, Daniel; Senturk, Izzet F; Catalyurek, Umit V; Wang, Hua; Hammack, Thomas S; Wolfgang, William J; Schoonmaker-Bopp, Dianna; Chu, Alvina; Myers, Robert; Haendiges, Julie; Evans, Peter S; Meng, Jianghong; Strain, Errol A; Allard, Marc W; Brown, Eric W.

J Infect Dis ; 213(4): 502-8, 2016 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-25995194

RESUMO

BACKGROUND: Using a novel combination of whole-genome sequencing (WGS) analysis and geographic metadata, we traced the origins of Salmonella Bareilly isolates collected in 2012 during a widespread food-borne outbreak in the United States associated with scraped tuna imported from India. METHODS: Using next-generation sequencing, we sequenced the complete genome of 100 Salmonella Bareilly isolates obtained from patients who consumed contaminated product, from natural sources, and from unrelated historically and geographically disparate foods. Pathogen genomes were linked to geography by projecting the phylogeny on a virtual globe and produced a transmission network. RESULTS: Phylogenetic analysis of WGS data revealed a common origin for outbreak strains, indicating that patients in Maryland and New York were infected from sources originating at a facility in India. CONCLUSIONS: These data represent the first report fully integrating WGS analysis with geographic mapping and a novel use of transmission networks. Results showed that WGS vastly improves our ability to delimit the scope and source of bacterial food-borne contamination events. Furthermore, these findings reinforce the extraordinary utility that WGS brings to global outbreak investigation as a greatly enhanced approach to protecting the human food supply chain as well as public health in general.

Assuntos

Surtos de Doenças , Doenças Transmitidas por Alimentos/epidemiologia , Infecções por Salmonella/epidemiologia , Salmonella enterica/classificação , Salmonella enterica/isolamento & purificação , Animais , Doenças Transmitidas por Alimentos/microbiologia , Genoma Bacteriano , Genótipo , Humanos , Índia , Epidemiologia Molecular , Tipagem Molecular , Filogeografia , Infecções por Salmonella/microbiologia , Salmonella enterica/genética , Análise de Sequência de DNA , Atum/microbiologia , Estados Unidos/epidemiologia

4.

Allele-specific imbalance mapping at human orthologs of mouse susceptibility to colon cancer (Scc) loci.

Gerber, Madelyn M; Hampel, Heather; Zhou, Xiao-Ping; Schulz, Nathan P; Suhy, Adam; Deveci, Mehmet; Çatalyürek, Ümit V; Ewart Toland, Amanda.

Int J Cancer ; 137(10): 2323-31, 2015 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-25973956

RESUMO

Colorectal cancer (CRC) can be classified into different types. Chromosomal instable (CIN) colon cancers are thought to be the most common type of colon cancer. The risk of developing a CIN-related CRC is due in part to inherited risk factors. Genome-wide association studies have yielded over 40 single nucleotide polymorphisms (SNPs) associated with CRC risk, but these only account for a subset of risk alleles. Some of this missing heritability may be due to gene-gene interactions. We developed a strategy to identify interacting candidate genes/loci for CRC risk that utilizes both linkage and RNA-seq data from mouse models in combination with allele-specific imbalance (ASI) studies in human tumors. We applied our strategy to three previously identified CRC susceptibility loci in the mouse that show evidence of genetic interaction: Scc4, Scc5 and Scc13. 525 SNPs from genes showing differential expression in the mouse and/or a previous role in cancer from the literature were evaluated for allele-specific imbalance in 194 paired human normal/tumor DNAs from CIN-related CRCs. One hundred three SNPs showing suggestive evidence of ASI (31 variants with uncorrected p values < 0.05) were genotyped in a validation set of 296 paired DNAs. Two variants in SNX10 (SCC13) showed significant evidence of allelic selection after multiple comparisons testing. Future studies will evaluate the role of these variants in combination with interacting genetic partners in colon cancer risk in mouse and humans.

Assuntos

Desequilíbrio Alélico , Neoplasias do Colo/genética , Predisposição Genética para Doença/genética , Neoplasias Experimentais/genética , Alelos , Animais , Instabilidade Cromossômica/genética , Hibridização Genômica Comparativa , Feminino , Genótipo , Humanos , Desequilíbrio de Ligação , Camundongos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de RNA/métodos

5.

A comparative analysis of biclustering algorithms for gene expression data.

Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V.

Brief Bioinform ; 14(3): 279-92, 2013 May.

Artigo em Inglês | MEDLINE | ID: mdl-22772837

RESUMO

The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.

Assuntos

Algoritmos , Expressão Gênica , Animais , Análise por Conglomerados , Análise Fatorial , Humanos , Modelos Teóricos , Análise de Sequência com Séries de Oligonucleotídeos

6.

Phylogenetic visualization of the spread of H7 influenza A viruses.

Janies, Daniel A; Pomeroy, Laura W; Krueger, Chris; Zhang, Yuqi; Senturk, Izzet F; Kaya, Kamer; Çatalyürek, Ümit V.

Cladistics ; 31(6): 679-691, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-34753271

RESUMO

Viruses of influenza A subtype H7 can be highly pathogenic and periodically infect humans. For example, there have been numerous outbreaks of H7 in the Americas and Europe since 1996. More recently, a reassortant H7N9 has emerged among humans and birds during 2013-2014 in China, Taiwan and Hong Kong. This H7N9 genome consists of genetic segments that assort with H7 and H9 viruses previously circulating in chickens and wild birds in China and ducks in Korea. Epidemic risk modellers have used agricultural, climatic and demographic data to predict that the virus will spread to northern Vietnam via poultry. To shed light on the traffic of H7 viruses in general, we examine genetic segments of influenza that have assorted with many strains of H7 viruses dating back to 1902. We focus on use cases from the United States, Italy and China. We apply a novel metric, betweenness, an associated phylogenetic visualization technique, transmission networks, and compare these with another technique, route mapping. In contrast to traditional views, our results illustrate that segments that assort with H7 viruses are spread frequently between the Americas and Eurasia. In summary, genetic segments that historically assort with H7 influenza viruses have been spread from China to: Australia, Czech Republic, Denmark, Egypt, Germany, Hong Kong, Italy, Japan, Mongolia, the Netherlands, New Zealand, Pakistan, South Africa, South Korea, Spain, Sweden, the UK, the US, and Vietnam.

7.

mrSNP: software to detect SNP effects on microRNA binding.

Deveci, Mehmet; Catalyürek, Umit V; Toland, Amanda Ewart.

BMC Bioinformatics ; 15: 73, 2014 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-24629096

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are short (19-23 nucleotides) non-coding RNAs that bind to sites in the 3'untranslated regions (3'UTR) of a targeted messenger RNA (mRNA). Binding leads to degradation of the transcript or blocked translation resulting in decreased expression of the targeted gene. Single nucleotide polymorphisms (SNPs) have been found in 3'UTRs that disrupt normal miRNA binding or introduce new binding sites and some of these have been associated with disease pathogenesis. This raises the importance of detecting miRNA targets and predicting the possible effects of SNPs on binding sites. In the last decade a number of studies have been conducted to predict the location of miRNA binding sites. However, there have been fewer algorithms published to analyze the effects of SNPs on miRNA binding. Moreover, the existing software has some shortcomings including the requirement for significant manual labor when working with huge lists of SNPs and that algorithms work only for SNPs present in databases such as dbSNP. These limitations become problematic as next-generation sequencing is leading to large numbers of novel variants in 3'UTRs. RESULT: In order to overcome these issues, we developed a web-server named mrSNP which predicts the impact of a SNP in a 3'UTR on miRNA binding. The proposed tool reduces the manual labor requirements and allows users to input any SNP that has been identified by any SNP-calling program. In testing the performance of mrSNP on SNPs experimentally validated to affect miRNA binding, mrSNP correctly identified 69% (11/16) of the SNPs disrupting binding. CONCLUSIONS: mrSNP is a highly adaptable and performing tool for predicting the effect a 3'UTR SNP will have on miRNA binding. This tool has advantages over existing algorithms because it can assess the effect of novel SNPs on miRNA binding without requiring significant hands on time.

Assuntos

MicroRNAs/genética , Análise de Sequência de RNA/métodos , Software , Regiões 3' não Traduzidas , Algoritmos , Sítios de Ligação/genética , Humanos , MicroRNAs/metabolismo , Polimorfismo de Nucleotídeo Único , RNA Mensageiro/genética , RNA Mensageiro/metabolismo

8.

Benchmarking short sequence mapping tools.

Hatem, Ayat; Bozdag, Doruk; Toland, Amanda E; Çatalyürek, Ümit V.

BMC Bioinformatics ; 14: 184, 2013 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-23758764

RESUMO

BACKGROUND: The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. RESULTS: We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. CONCLUSION: The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Benchmarking , Genoma

9.

KELVIN: a software package for rigorous measurement of statistical evidence in human genetics.

Vieland, Veronica J; Huang, Yungui; Seok, Sang-Cheol; Burian, John; Catalyurek, Umit; O'Connell, Jeffrey; Segre, Alberto; Valentine-Cooper, William.

Hum Hered ; 72(4): 276-88, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-22189470

RESUMO

This paper describes the software package KELVIN, which supports the PPL (posterior probability of linkage) framework for the measurement of statistical evidence in human (or more generally, diploid) genetic studies. In terms of scope, KELVIN supports two-point (trait-marker or marker-marker) and multipoint linkage analysis, based on either sex-averaged or sex-specific genetic maps, with an option to allow for imprinting; trait-marker linkage disequilibrium (LD), or association analysis, in case-control data, trio data, and/or multiplex family data, with options for joint linkage and trait-marker LD or conditional LD given linkage; dichotomous trait, quantitative trait and quantitative trait threshold models; and certain types of gene-gene interactions and covariate effects. Features and data (pedigree) structures can be freely mixed and matched within analyses. The statistical framework is specifically tailored to accumulate evidence in a mathematically rigorous way across multiple data sets or data subsets while allowing for multiple sources of heterogeneity, and KELVIN itself utilizes sophisticated software engineering to provide a powerful and robust platform for studying the genetics of complex disorders.

Assuntos

Ligação Genética , Modelos Estatísticos , Software , Mapeamento Cromossômico , Epistasia Genética , Impressão Genômica , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Linhagem , Locos de Características Quantitativas

10.

BOA: A partitioned view of genome assembly.

An, Xiaojing; Ghosh, Priyanka; Keppler, Patrick; Kurt, Sureyya Emre; Krishnamoorthy, Sriram; Sadayappan, Ponnuswamy; Rajam, Aravind Sukumaran; Çatalyürek, Ümit V; Kalyanaraman, Ananth.

iScience ; 25(11): 105273, 2022 Nov 18.

Artigo em Inglês | MEDLINE | ID: mdl-36304115

RESUMO

De novo genome assembly is a fundamental problem in computational molecular biology that aims to reconstruct an unknown genome sequence from a set of short DNA sequences (or reads) obtained from the genome. The relative ordering of the reads along the target genome is not known a priori, which is one of the main contributors to the increased complexity of the assembly process. In this article, with the dual objective of improving assembly quality and exposing a high degree of parallelism, we present a partitioning-based approach. Our framework, BOA (bucket-order-assemble), uses a bucketing alongside graph- and hypergraph-based partitioning techniques to produce a partial ordering of the reads. This partial ordering enables us to divide the read set into disjoint blocks that can be independently assembled in parallel using any state-of-the-art serial assembler of choice. Experimental results show that BOA improves both the overall assembly quality and performance.

11.

A knowledge-anchored integrative image search and retrieval system.

Erdal, Selnur; Catalyurek, Umit V; Payne, Philip R O; Saltz, Joel; Kamal, Jyoti; Gurcan, Metin N.

J Digit Imaging ; 22(2): 166-82, 2009 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-18040742

RESUMO

Clinical data that may be used in a secondary capacity to support research activities are regularly stored in three significantly different formats: (1) structured, codified data elements; (2) semi-structured or unstructured narrative text; and (3) multi-modal images. In this manuscript, we will describe the design of a computational system that is intended to support the ontology-anchored query and integration of such data types from multiple source systems. Additional features of the described system include (1) the use of Grid services-based electronic data interchange models to enable the use of our system in multi-site settings and (2) the use of a software framework intended to address both potential security and patient confidentiality concerns that arise when transmitting or otherwise manipulating potentially privileged personal health information. We will frame our discussion within the specific experimental context of the concept-oriented query and integration of correlated structured data, narrative text, and images for cancer research.

Assuntos

Armazenamento e Recuperação da Informação/métodos , Sistemas Computadorizados de Registros Médicos , Sistemas de Informação em Radiologia , Integração de Sistemas , Interface Usuário-Computador , Bases de Dados Factuais , Humanos , Software

12.

Large-scale biomedical image analysis in grid environments.

Kumar, Vijay S; Rutt, Benjamin; Kurc, Tahsin; Catalyurek, Umit V; Pan, Tony C; Chow, Sunny; Lamont, Stephan; Martone, Maryann; Saltz, Joel H.

IEEE Trans Inf Technol Biomed ; 12(2): 154-61, 2008 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-18348945

RESUMO

This paper presents the application of a component-based Grid middleware system for processing extremely large images obtained from digital microscopy devices. We have developed parallel, out-of-core techniques for different classes of data processing operations employed on images from confocal microscopy scanners. These techniques are combined into a data preprocessing and analysis pipeline using the component-based middleware system. The experimental results show that: 1) our implementation achieves good performance and can handle very large datasets on high-performance Grid nodes, consisting of computation and/or storage clusters and 2) it can take advantage of Grid nodes connected over high-bandwidth wide-area networks by combining task and data parallelism.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Internet , Microscopia/métodos , Sistemas de Informação em Radiologia , Processamento de Sinais Assistido por Computador , Disseminação de Informação/métodos

13.

An XML-based system for synthesis of data from disparate databases.

Kurc, Tahsin; Janies, Daniel A; Johnson, Andrew D; Langella, Stephen; Oster, Scott; Hastings, Shannon; Habib, Farhat; Camerlengo, Terry; Ervin, David; Catalyurek, Umit V; Saltz, Joel H.

J Am Med Inform Assoc ; 13(3): 289-301, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-16501185

RESUMO

Diverse data sets have become key building blocks of translational biomedical research. Data types captured and referenced by sophisticated research studies include high throughput genomic and proteomic data, laboratory data, data from imagery, and outcome data. In this paper, the authors present the application of an XML-based data management system to support integration of data from disparate data sources and large data sets. This system facilitates management of XML schemas and on-demand creation and management of XML databases that conform to these schemas. They illustrate the use of this system in an application for genotype-phenotype correlation analyses. This application implements a method of phenotype-genotype correlation based on phylogenetic optimization of large data sets of mouse SNPs and phenotypic data. The application workflow requires the management and integration of genomic information and phenotypic data from external data repositories and from the results of phenotype-genotype correlation analyses. Our implementation supports the process of carrying out a complex workflow that includes large-scale phylogenetic tree optimizations and application of Maddison's concentrated changes test to large phylogenetic tree data sets. The data management system also allows collaborators to share data in a uniform way and supports complex queries that target data sets.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genótipo , Armazenamento e Recuperação da Informação , Fenótipo , Animais , Doença das Coronárias/genética , Feminino , Haplótipos , Humanos , Filogenia , Polimorfismo de Nucleotídeo Único , Linguagens de Programação , Integração de Sistemas

14.

Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.

Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdag, Doruk; Kaya, Kamer; Çatalyürek, Ümit V.

Methods Mol Biol ; 1375: 55-74, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26626937

RESUMO

Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos , Bases de Dados Genéticas , Regulação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genes BRCA1 , Genes BRCA2 , Genes p53 , Humanos

15.

A grid-based image archival and analysis system.

Hastings, Shannon; Oster, Scott; Langella, Stephen; Kurc, Tahsin M; Pan, Tony; Catalyurek, Umit V; Saltz, Joel H.

J Am Med Inform Assoc ; 12(3): 286-95, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-15684129

RESUMO

Here the authors present a Grid-aware middleware system, called GridPACS, that enables management and analysis of images in a massive scale, leveraging distributed software components coupled with interconnected computation and storage platforms. The need for this infrastructure is driven by the increasing biomedical role played by complex datasets obtained through a variety of imaging modalities. The GridPACS architecture is designed to support a wide range of biomedical applications encountered in basic and clinical research, which make use of large collections of images. Imaging data yield a wealth of metabolic and anatomic information from macroscopic (e.g., radiology) to microscopic (e.g., digitized slides) scale. Whereas this information can significantly improve understanding of disease pathophysiology as well as the noninvasive diagnosis of disease in patients, the need to process, analyze, and store large amounts of image data presents a great challenge.

Assuntos

Diagnóstico por Imagem , Armazenamento e Recuperação da Informação/métodos , Sistemas de Informação em Radiologia , Software , Redes de Comunicação de Computadores , Bases de Dados como Assunto , Interface Usuário-Computador

16.

Mitigating bias in planning two-colour microarray experiments.

Ferhatosmanoglu, Nilgun; Allen, Theodore T; Catalyurek, Umit V.

Int J Data Min Bioinform ; 13(1): 31-49, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26529906

RESUMO

Two-colour microarrays are used to study differential gene expression on a large scale. Experimental planning can help reduce the chances of wrong inferences about whether genes are differentially expressed. Previous research on this problem has focused on minimising estimation errors (according to variance-based criteria such as A-optimality) on the basis of optimistic assumptions about the system studied. In this paper, we propose a novel planning criterion to evaluate existing plans for microarray experiments. The proposed criterion is 'Generalised-A Optimality' that is based on realistic assumptions that include bias errors. Using Generalised-A Optimality, the reference-design approach is likely to yield greater estimation accuracy in specific situations in which loop designs had previously seemed superior. However, hybrid designs are likely to offer higher estimation accuracy than reference, loop and interwoven designs having the same number of samples and slides. These findings are supported by data from both simulated and real microarray experiments.

Assuntos

Mineração de Dados/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Modelos Teóricos , Análise de Sequência com Séries de Oligonucleotídeos

17.

A Novel Multiple Choice Question Generation Strategy: Alternative Uses for Controlled Vocabulary Thesauri in Biomedical-Sciences Education.

Lopetegui, Marcelo A; Lara, Barbara A; Yen, Po-Yin; Çatalyürek, Ümit V; Payne, Philip R O.

AMIA Annu Symp Proc ; 2015: 861-9, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26958222

RESUMO

Multiple choice questions play an important role in training and evaluating biomedical science students. However, the resource intensive nature of question generation limits their open availability, reducing their contribution to evaluation purposes mainly. Although applied-knowledge questions require a complex formulation process, the creation of concrete-knowledge questions (i.e., definitions, associations) could be assisted by the use of informatics methods. We envisioned a novel and simple algorithm that exploits validated knowledge repositories and generates concrete-knowledge questions by leveraging concepts' relationships. In this manuscript we present the development and validation of a prototype which successfully produced meaningful concrete-knowledge questions, opening new applications for existing knowledge repositories, potentially benefiting students of all biomedical sciences disciplines.

Assuntos

Algoritmos , Disciplinas das Ciências Biológicas/educação , Educação Médica , Avaliação Educacional/métodos , Vocabulário Controlado , Comportamento de Escolha , Humanos , Medical Subject Headings

18.

Syndromic Surveillance of Infectious Diseases meets Molecular Epidemiology in a Workflow and Phylogeographic Application.

Janies, Daniel; Witter, Zachary; Gibson, Christian; Kraft, Thomas; Senturk, Izzet F; Çatalyürek, Ümit.

Stud Health Technol Inform ; 216: 766-70, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26262155

RESUMO

Traditionally, epidemiologists have counted cases and groups of symptoms. Modeling on these data consists of predicting expansion or contraction in the number of cases over time in epidemic curves or compartment models. Geography is considered a variable when these data are presented in choropleth maps. These approaches have significant drawbacks if the cases counted are not accurately diagnosed. For example, most regional public health authorities count influenza like illnesses (ILI). Cases of these diseases are designated as ILI if the patient exhibits fever, respiratory symptoms, and perhaps gastrointestinal symptoms. Several molecular epidemiological studies have shown that there are many pathogens that cause these symptoms and the relative proportions of these pathogens change over time and space. One way to bridge the gap between syndromic and genetic surveillance of infectious diseases is to compare signals of symptoms to pathogens recorded in molecular databases. We present a web-based workflow application that uses chief complaints found in the public Twitter feed as a syndromic surveillance tool and connects outbreak signals in these data to pathogens historically known to circulate in the same area. For the pathogen(s) of interest, we provide Genbank links to metadata and sequences in a workflow for phylogeographic analysis and visualization. The visualizations provide information on the geographic traffic of the spread of the pathogens and places that are hubs for their transport.

Assuntos

Doenças Transmissíveis/epidemiologia , Doenças Transmissíveis/genética , Epidemiologia Molecular/métodos , Filogeografia/métodos , Mídias Sociais/estatística & dados numéricos , Fluxo de Trabalho , Humanos , Processamento de Linguagem Natural , Vigilância da População/métodos , Prevalência , Avaliação de Sintomas/métodos , Avaliação de Sintomas/estatística & dados numéricos

19.

The virtual microscope.

Catalyürek, Umit; Beynon, Michael D; Chang, Chialin; Kurc, Tahsin; Sussman, Alan; Saltz, Joel.

IEEE Trans Inf Technol Biomed ; 7(4): 230-48, 2003 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-15000350

RESUMO

We present the design and implementation of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. The system provides a form of completely digital telepathology, allowing simultaneous access to archived digital slide images by multiple clients. The main problem the system targets is storing and processing the extremely large quantities of data required to represent a collection of slides. The Virtual Microscope client software runs on the end user's PC or workstation, while database software for storing, retrieving and processing the microscope image data runs on a parallel computer or on a set of workstations at one or more potentially remote sites. We have designed and implemented two versions of the data server software. One implementation is a customization of a database system framework that is optimized for a tightly coupled parallel machine with attached local disks. The second implementation is component-based, and has been designed to accommodate access to and processing of data in a distributed, heterogeneous environment. We also have developed caching client software, implemented in Java, to achieve good response time and portability across different computer platforms. The performance results presented show that the Virtual Microscope systems scales well, so that many clients can be adequately serviced by an appropriately configured data server.

Assuntos

Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Microscopia/métodos , Software , Telepatologia/métodos , Interface Usuário-Computador , Gráficos por Computador , Meio Ambiente , Desenho de Equipamento , Análise de Falha de Equipamento , Aumento da Imagem/instrumentação , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/instrumentação , Microscopia/instrumentação , Design de Software , Integração de Sistemas , Telepatologia/instrumentação

20.

Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2.

Bokhari, Shahid H; Çatalyürek, Ümit V; Gurcan, Metin N.

Concurr Comput ; 26(18): 2836-2855, 2014 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-25598745

RESUMO

Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA