Pesquisa | BVS IEC

1.

A graphical, interactive and GPU-enabled workflow to process long-read sequencing data.

Reddy, Shishir; Hung, Ling-Hong; Sala-Torra, Olga; Radich, Jerald P; Yeung, Cecilia Cs; Yeung, Ka Yee.

BMC Genomics ; 22(1): 626, 2021 Aug 23.

Artigo em Inglês | MEDLINE | ID: mdl-34425749

RESUMO

BACKGROUND: Long-read sequencing has great promise in enabling portable, rapid molecular-assisted cancer diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw nanopore reads, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud. RESULTS: We present a graphical cloud-enabled workflow for fast, interactive analysis of nanopore sequencing data using GPUs. Users customize parameters, monitor execution and visualize results through an accessible graphical interface. The workflow and its components are completely containerized to ensure reproducibility and facilitate installation of the GPU-enabled software. We also provide an Amazon Machine Image (AMI) with all software and drivers pre-installed for GPU computing on the cloud. Most importantly, we demonstrate the potential of applying our software tools to reduce the turnaround time of cancer diagnostics by generating blood cancer (NB4, K562, ME1, 238 MV4;11) cell line Nanopore data using the Flongle adapter. We observe a 29x speedup and a 93x reduction in costs for the rate-limiting basecalling step in the analysis of blood cancer cell line data. CONCLUSIONS: Our interactive and efficient software tools will make analyses of Nanopore data using GPU and cloud computing accessible to biomedical and clinical scientists, thus facilitating the adoption of cost effective, fast, portable and real-time long-read sequencing.

Assuntos

Biologia Computacional , Software , Reprodutibilidade dos Testes , Análise de Sequência , Fluxo de Trabalho

2.

Holistic optimization of an RNA-seq workflow for multi-threaded environments.

Hung, Ling-Hong; Lloyd, Wes; Agumbe Sridhar, Radhika; Athmalingam Ravishankar, Saranya Devi; Xiong, Yuguang; Sobie, Eric; Yeung, Ka Yee.

Bioinformatics ; 35(20): 4173-4175, 2019 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-30859176

RESUMO

SUMMARY: For many next generation-sequencing pipelines, the most computationally intensive step is the alignment of reads to a reference sequence. As a result, alignment software such as the Burrows-Wheeler Aligner is optimized for speed and is often executed in parallel on the cloud. However, there are other less demanding steps that can also be optimized to significantly increase the speed especially when using many threads. We demonstrate this using a unique molecular identifier RNA-sequencing pipeline consisting of 3 steps: split, align, and merge. Optimization of all three steps yields a 40% increase in speed when executed using a single thread. However, when executed using 16 threads, we observe a 4-fold improvement over the original parallel implementation and more than an 8-fold improvement over the original single-threaded implementation. In contrast, optimizing only the alignment step results in just a 13% improvement over the original parallel workflow using 16 threads. AVAILABILITY AND IMPLEMENTATION: Code (M.I.T. license), supporting scripts and Dockerfiles are available at https://github.com/BioDepot/LINCS_RNAseq_cpp and Docker images at https://hub.docker.com/r/biodepot/rnaseq-umi-cpp/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

RNA-Seq , Fluxo de Trabalho , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Software

3.

Identifying Dynamical Time Series Model Parameters from Equilibrium Samples, with Application to Gene Regulatory Networks.

Young, William Chad; Yeung, Ka Yee; Raftery, Adrian E.

Stat Modelling ; 19(4): 444-465, 2019 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-33824624

RESUMO

Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression experiments, which attempt to control the expression of individual genes. We develop a new framework for network inference using samples from the equilibrium distribution of a vector autoregressive (VAR) time-series model which can be applied to steady state gene expression data. We explore the theoretical aspects of our method and apply the method to synthetic gene expression data generated using GeneNetWeaver.

4.

Construction of regulatory networks using expression time-series data of a genotyped population.

Yeung, Ka Yee; Dombek, Kenneth M; Lo, Kenneth; Mittler, John E; Zhu, Jun; Schadt, Eric E; Bumgarner, Roger E; Raftery, Adrian E.

Proc Natl Acad Sci U S A ; 108(48): 19436-41, 2011 Nov 29.

Artigo em Inglês | MEDLINE | ID: mdl-22084118

RESUMO

The inference of regulatory and biochemical networks from large-scale genomics data is a basic problem in molecular biology. The goal is to generate testable hypotheses of gene-to-gene influences and subsequently to design bench experiments to confirm these network predictions. Coexpression of genes in large-scale gene-expression data implies coregulation and potential gene-gene interactions, but provide little information about the direction of influences. Here, we use both time-series data and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generate microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. We develop a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We show that our inferred network recovers existing and novel regulatory relationships. Following network construction, we generate independent microarray data on selected deletion mutants to prospectively test network predictions. We demonstrate the potential of our network to discover de novo transcription-factor binding sites. Applying our construction method to previously published data demonstrates that our method is competitive with leading network construction algorithms in the literature.

Assuntos

Algoritmos , Regulação Fúngica da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Variação Genética , Redes e Vias Metabólicas/genética , Modelos Biológicos , Teorema de Bayes , Sítios de Ligação/genética , Modelos Logísticos , Fatores de Tempo , Fatores de Transcrição/genética , Leveduras

5.

Rapid detection of myeloid neoplasm fusions using single-molecule long-read sequencing.

Sala-Torra, Olga; Reddy, Shishir; Hung, Ling-Hong; Beppu, Lan; Wu, David; Radich, Jerald; Yeung, Ka Yee; Yeung, Cecilia C S.

PLOS Glob Public Health ; 3(9): e0002267, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37699001

RESUMO

Recurrent gene fusions are common drivers of disease pathophysiology in leukemias. Identifying these structural variants helps stratify disease by risk and assists with therapy choice. Precise molecular diagnosis in low-and-middle-income countries (LMIC) is challenging given the complexity of assays, trained technical support, and the availability of reliable electricity. Current fusion detection methods require a long turnaround time (7-10 days) or advance knowledge of the genes involved in the fusions. Recent technology developments have made sequencing possible without a sophisticated molecular laboratory, potentially making molecular diagnosis accessible to remote areas and low-income settings. We describe a long-read sequencing DNA assay designed with CRISPR guides to select and enrich for recurrent leukemia fusion genes, that does not need a priori knowledge of the abnormality present. By applying rapid sequencing technology based on nanopores, we sequenced long pieces of genomic DNA and successfully detected fusion genes in cell lines and primary specimens (e.g., BCR::ABL1, PML::RARA, CBFB::MYH11, KMT2A::AFF1) using cloud-based bioinformatics workflows with novel custom fusion finder software. We detected fusion genes in 100% of cell lines with the expected breakpoints and confirmed the presence or absence of a recurrent fusion gene in 12 of 14 patient cases. With our optimized assay and cloud-based bioinformatics workflow, these assays and analyses could be performed in under 8 hours. The platform's portability, potential for adaptation to lower-cost devices, and integrated cloud analysis make this assay a candidate to be placed in settings like LMIC to bridge the need of bedside rapid molecular diagnostics.

6.

A Randomized Controlled Trial of Precision Nutrition Counseling for Service Members at Risk for Metabolic Syndrome.

McCarthy, Mary S; Colburn, Zachary T; Yeung, Ka Yee; Gillette, Laurel H; Hung, Ling-Hong; Elshaw, Evelyn.

Mil Med ; 188(Suppl 6): 606-613, 2023 11 08.

Artigo em Inglês | MEDLINE | ID: mdl-37948286

RESUMO

INTRODUCTION: Metabolic syndrome (MetS) is a threat to the active component military as it impacts health, readiness, retention, and cost to the Military Health System. The most prevalent risk factors documented in service members' health records are high blood pressure (BP), low high-density lipoprotein cholesterol, and elevated triglycerides. Other risk factors include abdominal obesity and elevated fasting blood glucose. Precision nutrition counseling and wellness software applications have demonstrated positive results for weight management when coupled with high levels of participant engagement and motivation. MATERIALS AND METHODS: In this prospective randomized controlled trial, trained registered dietitians conducted nutrition counseling using results of targeted sequencing, biomarkers, and expert recommendations to reduce the risk for MetS. Upon randomization, the treatment arm initiated six weekly sessions and the control arm received educational pamphlets. An eHealth application captured diet and physical activity. Anthropometrics and BP were measured at baseline, 6 weeks, and 12 weeks, and biomarkers were measured at baseline and 12 weeks. The primary outcome was a change in weight at 12 weeks. Statistical analysis included descriptive statistics and t-tests or analysis of variance with significance set at P < .05. RESULTS: Overall, 138 subjects enrolled from November 2019 to February 2021 between two military bases; 107 completed the study. Demographics were as follows: 66% male, mean age 31 years, 66% married, and 49% Caucasian and non-Hispanic. Weight loss was not significant between groups or sites at 12 weeks. Overall, 27% of subjects met the diagnostic criteria for MetS on enrollment and 17.8% upon study completion. High deleterious variant prevalence was identified for genes with single-nucleotide polymorphisms linked to obesity (40%), cholesterol (38%), and BP (58%). Overall, 65% of subjects had low 25(OH)D upon enrollment; 45% remained insufficient at study completion. eHealth app had low adherence yet sufficient correlation with a valid reference. CONCLUSIONS: Early signs of progress with weight loss at 6 weeks were not sustained at 12 weeks. DNA-based nutrition counseling was not efficacious for weight loss.

Assuntos

Síndrome Metabólica , Humanos , Masculino , Adulto , Feminino , Síndrome Metabólica/epidemiologia , Estudos Prospectivos , Obesidade , Redução de Peso , Colesterol , Aconselhamento , Biomarcadores

7.

Cloud-enabled Biodepot workflow builder integrates image processing using Fiji with reproducible data analysis using Jupyter notebooks.

Hung, Ling-Hong; Straw, Evan; Reddy, Shishir; Schmitz, Robert; Colburn, Zachary; Yeung, Ka Yee.

Sci Rep ; 12(1): 14920, 2022 Sep 02.

Artigo em Inglês | MEDLINE | ID: mdl-36056115

RESUMO

Modern biomedical image analyses workflows contain multiple computational processing tasks giving rise to problems in reproducibility. In addition, image datasets can span both spatial and temporal dimensions, with additional channels for fluorescence and other data, resulting in datasets that are too large to be processed locally on a laptop. For omics analyses, software containers have been shown to enhance reproducibility, facilitate installation and provide access to scalable computational resources on the cloud. However, most image analyses contain steps that are graphical and interactive, features that are not supported by most omics execution engines. We present the containerized and cloud-enabled Biodepot-workflow-builder platform that supports graphics from software containers and has been extended for image analyses. We demonstrate the potential of our modular approach with multi-step workflows that incorporate the popular and open-source Fiji suite for image processing. One of our examples integrates fully interactive ImageJ macros with Jupyter notebooks. Our second example illustrates how the complicated cloud setup of an computationally intensive process such as stitching 3D digital pathology datasets using BigStitcher can be automated and simplified. In both examples, users can leverage a form-based graphical interface to execute multi-step workflows with a single click, using the provided sample data and preset input parameters. Alternatively, users can interactively modify the image processing steps in the workflow, apply the workflows to their own data, change the input parameters and macros. By providing interactive graphics support to software containers, our modular platform supports reproducible image analysis workflows, simplified access to cloud resources for analysis of large datasets, and integration across different applications such as Jupyter.

Assuntos

Análise de Dados , Software , Biologia Computacional/métodos , Processamento de Imagem Assistida por Computador/métodos , Reprodutibilidade dos Testes , Fluxo de Trabalho

8.

Container Profiler: Profiling resource utilization of containerized big data pipelines.

Hoang, Varik; Hung, Ling-Hong; Perez, David; Deng, Huazeng; Schooley, Raymond; Arumilli, Niharika; Yeung, Ka Yee; Lloyd, Wes.

Gigascience ; 122022 12 28.

Artigo em Inglês | MEDLINE | ID: mdl-37624874

RESUMO

BACKGROUND: This article presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over 60 Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler supports performing time-series profiling at a configurable sampling interval to enable continuous monitoring of the resources consumed by containerized tasks and pipelines. RESULTS: To investigate the utility of the Container Profiler, we profile the resource utilization requirements of a multistage bioinformatics analytical pipeline (RNA sequencing using unique molecular identifiers). We examine profiling metrics to assess patterns of CPU, disk, and network resource utilization across the different stages of the pipeline. We also quantify the profiling overhead of our Container Profiler tool to assess the impact of profiling a running pipeline with different levels of profiling granularity, verifying that impacts are negligible. CONCLUSIONS: The Container Profiler provides a useful tool that can be used to continuously monitor the resource consumption of long and complex containerized applications that run locally or on the cloud. This can help identify bottlenecks where more resources are needed to improve performance.

Assuntos

Benchmarking , Big Data , Biologia Computacional , Software , Fatores de Tempo

9.

The derivation of diagnostic markers of chronic myeloid leukemia progression from microarray data.

Oehler, Vivian G; Yeung, Ka Yee; Choi, Yongjae E; Bumgarner, Roger E; Raftery, Adrian E; Radich, Jerald P.

Blood ; 114(15): 3292-8, 2009 Oct 08.

Artigo em Inglês | MEDLINE | ID: mdl-19654405

RESUMO

Currently, limited molecular markers exist that can determine where in the spectrum of chronic myeloid leukemia (CML) progression an individual patient falls at diagnosis. Gene expression profiles can predict disease and prognosis, but most widely used microarray analytical methods yield lengthy gene candidate lists that are difficult to apply clinically. Consequently, we applied a probabilistic method called Bayesian model averaging (BMA) to a large CML microarray dataset. BMA, a supervised method, considers multiple genes simultaneously and identifies small gene sets. BMA identified 6 genes (NOB1, DDX47, IGSF2, LTB4R, SCARB1, and SLC25A3) that discriminated chronic phase (CP) from blast crisis (BC) CML. In CML, phase labels divide disease progression into discrete states. BMA, however, produces posterior probabilities between 0 and 1 and predicts patients in "intermediate" stages. In validation studies of 88 patients, the 6-gene signature discriminated early CP from late CP, accelerated phase, and BC. This distinction between early and late CP is not possible with current classifications, which are based on known duration of disease. BMA is a powerful tool for developing diagnostic tests from microarray data. Because therapeutic outcomes are so closely tied to disease phase, these probabilities can be used to determine a risk-based treatment strategy at diagnosis.

Assuntos

Biomarcadores Tumorais/biossíntese , Crise Blástica/diagnóstico , Crise Blástica/metabolismo , Regulação Leucêmica da Expressão Gênica , Leucemia Mielogênica Crônica BCR-ABL Positiva/diagnóstico , Leucemia Mielogênica Crônica BCR-ABL Positiva/metabolismo , Proteínas de Neoplasias/biossíntese , Biomarcadores Tumorais/genética , Crise Blástica/terapia , Feminino , Perfilação da Expressão Gênica , Humanos , Leucemia Mielogênica Crônica BCR-ABL Positiva/terapia , Masculino , Proteínas de Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos , Valor Preditivo dos Testes , Fatores de Risco

10.

Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data.

Annest, Amalia; Bumgarner, Roger E; Raftery, Adrian E; Yeung, Ka Yee.

BMC Bioinformatics ; 10: 72, 2009 Feb 26.

Artigo em Inglês | MEDLINE | ID: mdl-19245714

RESUMO

BACKGROUND: Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. RESULTS: We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139). CONCLUSION: The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.

Assuntos

Algoritmos , Neoplasias/mortalidade , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Teorema de Bayes , Neoplasias da Mama/genética , Neoplasias da Mama/mortalidade , Feminino , Humanos , Linfoma de Células B/genética , Linfoma de Células B/mortalidade , Análise de Sobrevida

11.

Methods for the inference of biological pathways and networks.

Bumgarner, Roger E; Yeung, Ka Yee.

Methods Mol Biol ; 541: 225-45, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19381545

RESUMO

In this chapter, we discuss a number of approaches to network inference from large-scale functional genomics data. Our goal is to describe current methods that can be used to infer predictive networks. At present, one of the most effective methods to produce networks with predictive value is the Bayesian network approach. This approach was initially instantiated by Friedman et al. and further refined by Eric Schadt and his research group. The Bayesian network approach has the virtue of identifying predictive relationships between genes from a combination of expression and eQTL data. However, the approach does not provide a mechanistic bases for predictive relationships and is ultimately hampered by an inability to model feedback. A challenge for the future is to produce networks that are both predictive and provide mechanistic understanding. To do so, the methods described in several chapters of this book will need to be integrated. Other chapters of this book describe a number of methods to identify or predict network components such as physical interactions. At the end of this chapter, we speculate that some of the approaches from other chapters could be integrated and used to "annotate" the edges of the Bayesian networks. This would take the Bayesian networks one step closer to providing mechanistic "explanations" for the relationships between the network nodes.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes/fisiologia , Redes e Vias Metabólicas/fisiologia , Transdução de Sinais/fisiologia , Animais , Análise por Conglomerados , Previsões , Perfilação da Expressão Gênica/métodos , Humanos , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

12.

Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data.

Liang, Xiao; Young, William Chad; Hung, Ling-Hong; Raftery, Adrian E; Yeung, Ka Yee.

J Comput Biol ; 26(10): 1113-1129, 2019 10.

Artigo em Inglês | MEDLINE | ID: mdl-31009236

RESUMO

The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.

Assuntos

Redes Reguladoras de Genes , Genômica/métodos , Células A549 , Teorema de Bayes , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Ontologia Genética , Humanos , Neoplasias Pulmonares/genética , Melanoma/genética , Neoplasias Cutâneas/genética , Aprendizado de Máquina Supervisionado , Transcriptoma

13.

Building Containerized Workflows Using the BioDepot-Workflow-Builder.

Hung, Ling-Hong; Hu, Jiaming; Meiss, Trevor; Ingersoll, Alyssa; Lloyd, Wes; Kristiyanto, Daniel; Xiong, Yuguang; Sobie, Eric; Yeung, Ka Yee.

Cell Syst ; 9(5): 508-514.e3, 2019 11 27.

Artigo em Inglês | MEDLINE | ID: mdl-31521606

RESUMO

We present the BioDepot-workflow-builder (Bwb), a software tool that allows users to create and execute reproducible bioinformatics workflows using a drag-and-drop interface. Graphical widgets represent Docker containers executing a modular task. Widgets are linked graphically to build bioinformatics workflows that can be reproducibly deployed across different local and cloud platforms. Each widget contains a form-based user interface to facilitate parameter entry and a console to display intermediate results. Bwb provides tools for rapid customization of widgets, containers, and workflows. Saved workflows can be shared using Bwb's native format or exported as shell scripts.

Assuntos

Biologia Computacional/métodos , Fluxo de Trabalho , Humanos , Software , Interface Usuário-Computador

14.

Hot-starting software containers for STAR aligner.

Zhang, Pai; Hung, Ling-Hong; Lloyd, Wes; Yeung, Ka Yee.

Gigascience ; 7(8)2018 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-30085034

RESUMO

Background: Using software containers has become standard practice to reproducibly deploy and execute biomedical workflows on the cloud. However, some applications that contain time-consuming initialization steps will produce unnecessary costs for repeated executions. Findings: We demonstrate that hot-starting from containers that have been frozen after the application has already begun execution can speed up bioinformatics workflows by avoiding repetitive initialization steps. We use an open-source tool called Checkpoint and Restore in Userspace (CRIU) to save the state of the containers as a collection of checkpoint files on disk after it has read in the indices. The resulting checkpoint files are migrated to the host, and CRIU is used to regenerate the containers in that ready-to-run hot-start state. As a proof-of-concept example, we create a hot-start container for the spliced transcripts alignment to a reference (STAR) aligner and deploy this container to align RNA sequencing data. We compare the performance of the alignment step with and without checkpoints on cloud platforms using local and network disks. Conclusions: We demonstrate that hot-starting Docker containers from snapshots taken after repetitive initialization steps are completed significantly speeds up the execution of the STAR aligner on all experimental platforms, including Amazon Web Services, Microsoft Azure, and local virtual machines. Our method can be potentially employed in other bioinformatics applications in which a checkpoint can be inserted after a repetitive initialization phase.

Assuntos

Biologia Computacional/métodos , Splicing de RNA , Análise de Sequência de RNA/métodos , Software , Asma/tratamento farmacológico , Asma/genética , Asma/metabolismo , Humanos , Miócitos de Músculo Liso/efeitos dos fármacos , Miócitos de Músculo Liso/metabolismo

15.

Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.

Almugbel, Reem; Hung, Ling-Hong; Hu, Jiaming; Almutairy, Abeer; Ortogero, Nicole; Tamta, Yashaswi; Yeung, Ka Yee.

J Am Med Inform Assoc ; 25(1): 4-12, 2018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-29092073

RESUMO

Objective: Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods: We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results: BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion: Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.

Assuntos

Biologia Computacional , Software , Fluxo de Trabalho , Pesquisa Biomédica , Reprodutibilidade dos Testes , Design de Software

16.

A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection.

Fourati, Slim; Talla, Aarthi; Mahmoudian, Mehrad; Burkhart, Joshua G; Klén, Riku; Henao, Ricardo; Yu, Thomas; Aydin, Zafer; Yeung, Ka Yee; Ahsen, Mehmet Eren; Almugbel, Reem; Jahandideh, Samad; Liang, Xiao; Nordling, Torbjörn E M; Shiga, Motoki; Stanescu, Ana; Vogel, Robert; Pandey, Gaurav; Chiu, Christopher; McClain, Micah T; Woods, Christopher W; Ginsburg, Geoffrey S; Elo, Laura L; Tsalik, Ephraim L; Mangravite, Lara M; Sieberts, Solveig K.

Nat Commun ; 9(1): 4418, 2018 10 24.

Artigo em Inglês | MEDLINE | ID: mdl-30356117

RESUMO

The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.

Assuntos

Expressão Gênica/genética , Voluntários Saudáveis , Heme/metabolismo , Humanos , Vírus da Influenza A Subtipo H1N2/imunologia , Vírus da Influenza A Subtipo H1N2/patogenicidade , Vírus da Influenza A Subtipo H3N2/imunologia , Vírus da Influenza A Subtipo H3N2/patogenicidade , Vírus Sinciciais Respiratórios/imunologia , Vírus Sinciciais Respiratórios/patogenicidade , Rhinovirus/imunologia , Rhinovirus/patogenicidade

17.

GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software.

Mittal, Varun; Hung, Ling-Hong; Keswani, Jayant; Kristiyanto, Daniel; Lee, Sung Bong; Yeung, Ka Yee.

Gigascience ; 6(4): 1-6, 2017 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-28327936

RESUMO

Background: Software container technology such as Docker can be used to package and distribute bioinformatics workflows consisting of multiple software implementations and dependencies. However, Docker is a command line-based tool, and many bioinformatics pipelines consist of components that require a graphical user interface. Results: We present a container tool called GUIdock-VNC that uses a graphical desktop sharing system to provide a browser-based interface for containerized software. GUIdock-VNC uses the Virtual Network Computing protocol to render the graphics within most commonly used browsers. We also present a minimal image builder that can add our proposed graphical desktop sharing system to any Docker packages, with the end result that any Docker packages can be run using a graphical desktop within a browser. In addition, GUIdock-VNC uses the Oauth2 authentication protocols when deployed on the cloud. Conclusions: As a proof-of-concept, we demonstrated the utility of GUIdock-noVNC in gene network inference. We benchmarked our container implementation on various operating systems and showed that our solution creates minimal overhead.

Assuntos

Biologia Computacional/métodos , Software , Interface Usuário-Computador , Navegador , Redes Reguladoras de Genes , Biologia de Sistemas/métodos

18.

fastBMA: scalable network inference and transitive reduction.

Hung, Ling-Hong; Shi, Kaiyuan; Wu, Migao; Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee.

Gigascience ; 6(10): 1-10, 2017 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-29020744

RESUMO

Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).

Assuntos

Algoritmos , Redes Reguladoras de Genes , Genoma Fúngico , Genoma Humano , Teorema de Bayes , Expressão Gênica , Humanos , Modelos Estatísticos , Saccharomyces cerevisiae

19.

A posterior probability approach for gene regulatory network inference in genetic perturbation data.

Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee.

Math Biosci Eng ; 13(6): 1241-1251, 2016 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-27775378

RESUMO

Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach.

Assuntos

Redes Reguladoras de Genes , Modelos Biológicos , Biologia de Sistemas/métodos , Algoritmos , Probabilidade

20.

Model-Based Clustering With Data Correction For Removing Artifacts In Gene Expression Data.

Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee.

Ann Appl Stat ; 11(4): 1998-2026, 2016 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-30740193

RESUMO

The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution, leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA