Búsqueda | Portal de Búsqueda de la BVS España

1.

A graphical, interactive and GPU-enabled workflow to process long-read sequencing data.

Reddy, Shishir; Hung, Ling-Hong; Sala-Torra, Olga; Radich, Jerald P; Yeung, Cecilia Cs; Yeung, Ka Yee.

BMC Genomics ; 22(1): 626, 2021 Aug 23.

Artículo en Inglés | MEDLINE | ID: mdl-34425749

RESUMEN

BACKGROUND: Long-read sequencing has great promise in enabling portable, rapid molecular-assisted cancer diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw nanopore reads, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud. RESULTS: We present a graphical cloud-enabled workflow for fast, interactive analysis of nanopore sequencing data using GPUs. Users customize parameters, monitor execution and visualize results through an accessible graphical interface. The workflow and its components are completely containerized to ensure reproducibility and facilitate installation of the GPU-enabled software. We also provide an Amazon Machine Image (AMI) with all software and drivers pre-installed for GPU computing on the cloud. Most importantly, we demonstrate the potential of applying our software tools to reduce the turnaround time of cancer diagnostics by generating blood cancer (NB4, K562, ME1, 238 MV4;11) cell line Nanopore data using the Flongle adapter. We observe a 29x speedup and a 93x reduction in costs for the rate-limiting basecalling step in the analysis of blood cancer cell line data. CONCLUSIONS: Our interactive and efficient software tools will make analyses of Nanopore data using GPU and cloud computing accessible to biomedical and clinical scientists, thus facilitating the adoption of cost effective, fast, portable and real-time long-read sequencing.

Asunto(s)

Biología Computacional , Programas Informáticos , Reproducibilidad de los Resultados , Análisis de Secuencia , Flujo de Trabajo

2.

Holistic optimization of an RNA-seq workflow for multi-threaded environments.

Hung, Ling-Hong; Lloyd, Wes; Agumbe Sridhar, Radhika; Athmalingam Ravishankar, Saranya Devi; Xiong, Yuguang; Sobie, Eric; Yeung, Ka Yee.

Bioinformatics ; 35(20): 4173-4175, 2019 10 15.

Artículo en Inglés | MEDLINE | ID: mdl-30859176

RESUMEN

SUMMARY: For many next generation-sequencing pipelines, the most computationally intensive step is the alignment of reads to a reference sequence. As a result, alignment software such as the Burrows-Wheeler Aligner is optimized for speed and is often executed in parallel on the cloud. However, there are other less demanding steps that can also be optimized to significantly increase the speed especially when using many threads. We demonstrate this using a unique molecular identifier RNA-sequencing pipeline consisting of 3 steps: split, align, and merge. Optimization of all three steps yields a 40% increase in speed when executed using a single thread. However, when executed using 16 threads, we observe a 4-fold improvement over the original parallel implementation and more than an 8-fold improvement over the original single-threaded implementation. In contrast, optimizing only the alignment step results in just a 13% improvement over the original parallel workflow using 16 threads. AVAILABILITY AND IMPLEMENTATION: Code (M.I.T. license), supporting scripts and Dockerfiles are available at https://github.com/BioDepot/LINCS_RNAseq_cpp and Docker images at https://hub.docker.com/r/biodepot/rnaseq-umi-cpp/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

RNA-Seq , Flujo de Trabajo , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN , Programas Informáticos

3.

fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.

Hung, Ling-Hong; Samudrala, Ram.

Bioinformatics ; 30(12): 1774-6, 2014 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-24532722

RESUMEN

MOTIVATION: fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. RESULTS: fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. AVAILABILITY AND IMPLEMENTATION: fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster)

Asunto(s)

Conformación Proteica , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Modelos Moleculares

4.

Accelerated protein structure comparison using TM-score-GPU.

Hung, Ling-Hong; Samudrala, Ram.

Bioinformatics ; 28(16): 2191-2, 2012 Aug 15.

Artículo en Inglés | MEDLINE | ID: mdl-22718788

RESUMEN

MOTIVATION: Accurate comparisons of different protein structures play important roles in structural biology, structure prediction and functional annotation. The root-mean-square-deviation (RMSD) after optimal superposition is the predominant measure of similarity due to the ease and speed of computation. However, global RMSD is dependent on the length of the protein and can be dominated by divergent loops that can obscure local regions of similarity. A more sophisticated measure of structure similarity, Template Modeling (TM)-score, avoids these problems, and it is one of the measures used by the community-wide experiments of critical assessment of protein structure prediction to compare predicted models with experimental structures. TM-score calculations are, however, much slower than RMSD calculations. We have therefore implemented a very fast version of TM-score for Graphical Processing Units (TM-score-GPU), using a new and novel hybrid Kabsch/quaternion method for calculating the optimal superposition and RMSD that is designed for parallel applications. This acceleration in speed allows TM-score to be used efficiently in computationally intensive applications such as for clustering of protein models and genome-wide comparisons of structure. RESULTS: TM-score-GPU was applied to six sets of models from Nutritious Rice for the World for a total of 3 million comparisons. TM-score-GPU is 68 times faster on an ATI 5870 GPU, on average, than the original CPU single-threaded implementation on an AMD Phenom II 810 quad-core processor. AVAILABILITY AND IMPLEMENTATION: The complete source, including the GPU code and the hybrid RMSD subroutine, can be downloaded and used without restriction at http://software.compbio.washington.edu/misc/downloads/tmscore/. The implementation is in C++/OpenCL.

Asunto(s)

Biología Computacional/métodos , Conformación Proteica , Proteínas/química , Programas Informáticos , Algoritmos

5.

Structure prediction of partial-length protein sequences.

Laurenzi, Adrian; Hung, Ling-Hong; Samudrala, Ram.

Int J Mol Sci ; 14(7): 14892-907, 2013 Jul 17.

Artículo en Inglés | MEDLINE | ID: mdl-23867606

RESUMEN

Protein structure information is essential to understand protein function. Computational methods to accurately predict protein structure from the sequence have primarily been evaluated on protein sequences representing full-length native proteins. Here, we demonstrate that top-performing structure prediction methods can accurately predict the partial structures of proteins encoded by sequences that contain approximately 50% or more of the full-length protein sequence. We hypothesize that structure prediction may be useful for predicting functions of proteins whose corresponding genes are mapped expressed sequence tags (ESTs) that encode partial-length amino acid sequences. Additionally, we identify a confidence score representing the quality of a predicted structure as a useful means of predicting the likelihood that an arbitrary polypeptide sequence represents a portion of a foldable protein sequence ("foldability"). This work has ramifications for the prediction of protein structure with limited or noisy sequence information, as well as genome annotation.

Asunto(s)

Proteínas/química , Bases de Datos de Proteínas , Etiquetas de Secuencia Expresada , Pliegue de Proteína , Estructura Terciaria de Proteína , Proteínas/metabolismo , Programas Informáticos

6.

Rapid detection of myeloid neoplasm fusions using single-molecule long-read sequencing.

Sala-Torra, Olga; Reddy, Shishir; Hung, Ling-Hong; Beppu, Lan; Wu, David; Radich, Jerald; Yeung, Ka Yee; Yeung, Cecilia C S.

PLOS Glob Public Health ; 3(9): e0002267, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37699001

RESUMEN

Recurrent gene fusions are common drivers of disease pathophysiology in leukemias. Identifying these structural variants helps stratify disease by risk and assists with therapy choice. Precise molecular diagnosis in low-and-middle-income countries (LMIC) is challenging given the complexity of assays, trained technical support, and the availability of reliable electricity. Current fusion detection methods require a long turnaround time (7-10 days) or advance knowledge of the genes involved in the fusions. Recent technology developments have made sequencing possible without a sophisticated molecular laboratory, potentially making molecular diagnosis accessible to remote areas and low-income settings. We describe a long-read sequencing DNA assay designed with CRISPR guides to select and enrich for recurrent leukemia fusion genes, that does not need a priori knowledge of the abnormality present. By applying rapid sequencing technology based on nanopores, we sequenced long pieces of genomic DNA and successfully detected fusion genes in cell lines and primary specimens (e.g., BCR::ABL1, PML::RARA, CBFB::MYH11, KMT2A::AFF1) using cloud-based bioinformatics workflows with novel custom fusion finder software. We detected fusion genes in 100% of cell lines with the expected breakpoints and confirmed the presence or absence of a recurrent fusion gene in 12 of 14 patient cases. With our optimized assay and cloud-based bioinformatics workflow, these assays and analyses could be performed in under 8 hours. The platform's portability, potential for adaptation to lower-cost devices, and integrated cloud analysis make this assay a candidate to be placed in settings like LMIC to bridge the need of bedside rapid molecular diagnostics.

7.

A Randomized Controlled Trial of Precision Nutrition Counseling for Service Members at Risk for Metabolic Syndrome.

McCarthy, Mary S; Colburn, Zachary T; Yeung, Ka Yee; Gillette, Laurel H; Hung, Ling-Hong; Elshaw, Evelyn.

Mil Med ; 188(Suppl 6): 606-613, 2023 11 08.

Artículo en Inglés | MEDLINE | ID: mdl-37948286

RESUMEN

INTRODUCTION: Metabolic syndrome (MetS) is a threat to the active component military as it impacts health, readiness, retention, and cost to the Military Health System. The most prevalent risk factors documented in service members' health records are high blood pressure (BP), low high-density lipoprotein cholesterol, and elevated triglycerides. Other risk factors include abdominal obesity and elevated fasting blood glucose. Precision nutrition counseling and wellness software applications have demonstrated positive results for weight management when coupled with high levels of participant engagement and motivation. MATERIALS AND METHODS: In this prospective randomized controlled trial, trained registered dietitians conducted nutrition counseling using results of targeted sequencing, biomarkers, and expert recommendations to reduce the risk for MetS. Upon randomization, the treatment arm initiated six weekly sessions and the control arm received educational pamphlets. An eHealth application captured diet and physical activity. Anthropometrics and BP were measured at baseline, 6 weeks, and 12 weeks, and biomarkers were measured at baseline and 12 weeks. The primary outcome was a change in weight at 12 weeks. Statistical analysis included descriptive statistics and t-tests or analysis of variance with significance set at P < .05. RESULTS: Overall, 138 subjects enrolled from November 2019 to February 2021 between two military bases; 107 completed the study. Demographics were as follows: 66% male, mean age 31 years, 66% married, and 49% Caucasian and non-Hispanic. Weight loss was not significant between groups or sites at 12 weeks. Overall, 27% of subjects met the diagnostic criteria for MetS on enrollment and 17.8% upon study completion. High deleterious variant prevalence was identified for genes with single-nucleotide polymorphisms linked to obesity (40%), cholesterol (38%), and BP (58%). Overall, 65% of subjects had low 25(OH)D upon enrollment; 45% remained insufficient at study completion. eHealth app had low adherence yet sufficient correlation with a valid reference. CONCLUSIONS: Early signs of progress with weight loss at 6 weeks were not sustained at 12 weeks. DNA-based nutrition counseling was not efficacious for weight loss.

Asunto(s)

Síndrome Metabólico , Humanos , Masculino , Adulto , Femenino , Síndrome Metabólico/epidemiología , Estudios Prospectivos , Obesidad , Pérdida de Peso , Colesterol , Consejo , Biomarcadores

8.

Cloud-enabled Biodepot workflow builder integrates image processing using Fiji with reproducible data analysis using Jupyter notebooks.

Hung, Ling-Hong; Straw, Evan; Reddy, Shishir; Schmitz, Robert; Colburn, Zachary; Yeung, Ka Yee.

Sci Rep ; 12(1): 14920, 2022 Sep 02.

Artículo en Inglés | MEDLINE | ID: mdl-36056115

RESUMEN

Modern biomedical image analyses workflows contain multiple computational processing tasks giving rise to problems in reproducibility. In addition, image datasets can span both spatial and temporal dimensions, with additional channels for fluorescence and other data, resulting in datasets that are too large to be processed locally on a laptop. For omics analyses, software containers have been shown to enhance reproducibility, facilitate installation and provide access to scalable computational resources on the cloud. However, most image analyses contain steps that are graphical and interactive, features that are not supported by most omics execution engines. We present the containerized and cloud-enabled Biodepot-workflow-builder platform that supports graphics from software containers and has been extended for image analyses. We demonstrate the potential of our modular approach with multi-step workflows that incorporate the popular and open-source Fiji suite for image processing. One of our examples integrates fully interactive ImageJ macros with Jupyter notebooks. Our second example illustrates how the complicated cloud setup of an computationally intensive process such as stitching 3D digital pathology datasets using BigStitcher can be automated and simplified. In both examples, users can leverage a form-based graphical interface to execute multi-step workflows with a single click, using the provided sample data and preset input parameters. Alternatively, users can interactively modify the image processing steps in the workflow, apply the workflows to their own data, change the input parameters and macros. By providing interactive graphics support to software containers, our modular platform supports reproducible image analysis workflows, simplified access to cloud resources for analysis of large datasets, and integration across different applications such as Jupyter.

Asunto(s)

Análisis de Datos , Programas Informáticos , Biología Computacional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Reproducibilidad de los Resultados , Flujo de Trabajo

9.

Container Profiler: Profiling resource utilization of containerized big data pipelines.

Hoang, Varik; Hung, Ling-Hong; Perez, David; Deng, Huazeng; Schooley, Raymond; Arumilli, Niharika; Yeung, Ka Yee; Lloyd, Wes.

Gigascience ; 122022 12 28.

Artículo en Inglés | MEDLINE | ID: mdl-37624874

RESUMEN

BACKGROUND: This article presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over 60 Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler supports performing time-series profiling at a configurable sampling interval to enable continuous monitoring of the resources consumed by containerized tasks and pipelines. RESULTS: To investigate the utility of the Container Profiler, we profile the resource utilization requirements of a multistage bioinformatics analytical pipeline (RNA sequencing using unique molecular identifiers). We examine profiling metrics to assess patterns of CPU, disk, and network resource utilization across the different stages of the pipeline. We also quantify the profiling overhead of our Container Profiler tool to assess the impact of profiling a running pipeline with different levels of profiling granularity, verifying that impacts are negligible. CONCLUSIONS: The Container Profiler provides a useful tool that can be used to continuously monitor the resource consumption of long and complex containerized applications that run locally or on the cloud. This can help identify bottlenecks where more resources are needed to improve performance.

Asunto(s)

Benchmarking , Macrodatos , Biología Computacional , Programas Informáticos , Factores de Tiempo

10.

Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data.

Liang, Xiao; Young, William Chad; Hung, Ling-Hong; Raftery, Adrian E; Yeung, Ka Yee.

J Comput Biol ; 26(10): 1113-1129, 2019 10.

Artículo en Inglés | MEDLINE | ID: mdl-31009236

RESUMEN

The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.

Asunto(s)

Redes Reguladoras de Genes , Genómica/métodos , Células A549 , Teorema de Bayes , Línea Celular Tumoral , Regulación Neoplásica de la Expresión Génica , Ontología de Genes , Humanos , Neoplasias Pulmonares/genética , Melanoma/genética , Neoplasias Cutáneas/genética , Aprendizaje Automático Supervisado , Transcriptoma

11.

Building Containerized Workflows Using the BioDepot-Workflow-Builder.

Hung, Ling-Hong; Hu, Jiaming; Meiss, Trevor; Ingersoll, Alyssa; Lloyd, Wes; Kristiyanto, Daniel; Xiong, Yuguang; Sobie, Eric; Yeung, Ka Yee.

Cell Syst ; 9(5): 508-514.e3, 2019 11 27.

Artículo en Inglés | MEDLINE | ID: mdl-31521606

RESUMEN

We present the BioDepot-workflow-builder (Bwb), a software tool that allows users to create and execute reproducible bioinformatics workflows using a drag-and-drop interface. Graphical widgets represent Docker containers executing a modular task. Widgets are linked graphically to build bioinformatics workflows that can be reproducibly deployed across different local and cloud platforms. Each widget contains a form-based user interface to facilitate parameter entry and a console to display intermediate results. Bwb provides tools for rapid customization of widgets, containers, and workflows. Saved workflows can be shared using Bwb's native format or exported as shell scripts.

Asunto(s)

Biología Computacional/métodos , Flujo de Trabajo , Humanos , Programas Informáticos , Interfaz Usuario-Computador

12.

Scoring functions for de novo protein structure prediction revisited.

Ngan, Shing-Chung; Hung, Ling-Hong; Liu, Tianyun; Samudrala, Ram.

Methods Mol Biol ; 413: 243-81, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-18075169

RESUMEN

De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.

Asunto(s)

Conformación Proteica , Algoritmos , Animales , Bases de Datos de Proteínas , Humanos , Pliegue de Proteína , Proteínas/química

13.

Hot-starting software containers for STAR aligner.

Zhang, Pai; Hung, Ling-Hong; Lloyd, Wes; Yeung, Ka Yee.

Gigascience ; 7(8)2018 08 01.

Artículo en Inglés | MEDLINE | ID: mdl-30085034

RESUMEN

Background: Using software containers has become standard practice to reproducibly deploy and execute biomedical workflows on the cloud. However, some applications that contain time-consuming initialization steps will produce unnecessary costs for repeated executions. Findings: We demonstrate that hot-starting from containers that have been frozen after the application has already begun execution can speed up bioinformatics workflows by avoiding repetitive initialization steps. We use an open-source tool called Checkpoint and Restore in Userspace (CRIU) to save the state of the containers as a collection of checkpoint files on disk after it has read in the indices. The resulting checkpoint files are migrated to the host, and CRIU is used to regenerate the containers in that ready-to-run hot-start state. As a proof-of-concept example, we create a hot-start container for the spliced transcripts alignment to a reference (STAR) aligner and deploy this container to align RNA sequencing data. We compare the performance of the alignment step with and without checkpoints on cloud platforms using local and network disks. Conclusions: We demonstrate that hot-starting Docker containers from snapshots taken after repetitive initialization steps are completed significantly speeds up the execution of the STAR aligner on all experimental platforms, including Amazon Web Services, Microsoft Azure, and local virtual machines. Our method can be potentially employed in other bioinformatics applications in which a checkpoint can be inserted after a repetitive initialization phase.

Asunto(s)

Biología Computacional/métodos , Empalme del ARN , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Asma/tratamiento farmacológico , Asma/genética , Asma/metabolismo , Humanos , Miocitos del Músculo Liso/efectos de los fármacos , Miocitos del Músculo Liso/metabolismo

14.

Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.

Almugbel, Reem; Hung, Ling-Hong; Hu, Jiaming; Almutairy, Abeer; Ortogero, Nicole; Tamta, Yashaswi; Yeung, Ka Yee.

J Am Med Inform Assoc ; 25(1): 4-12, 2018 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-29092073

RESUMEN

Objective: Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods: We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results: BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion: Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.

Asunto(s)

Biología Computacional , Programas Informáticos , Flujo de Trabajo , Investigación Biomédica , Reproducibilidad de los Resultados , Diseño de Software

15.

PROTINFO: new algorithms for enhanced protein structure predictions.

Hung, Ling-Hong; Ngan, Shing-Chung; Liu, Tianyun; Samudrala, Ram.

Nucleic Acids Res ; 33(Web Server issue): W77-80, 2005 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-15980581

RESUMEN

We describe new algorithms and modules for protein structure prediction available as part of the PROTINFO web server. The modules, comparative and de novo modelling, have significantly improved back-end algorithms that were rigorously evaluated at the sixth meeting on the Critical Assessment of Protein Structure Prediction methods. We were one of four server groups invited to make an oral presentation (only the best performing groups are asked to do so). These two modules allow a user to submit a protein sequence and return atomic coordinates representing the tertiary structure of that protein. The PROTINFO server is available at http://protinfo.compbio.washington.edu.

Asunto(s)

Algoritmos , Modelos Moleculares , Estructura Terciaria de Proteína , Programas Informáticos , Internet , Homología Estructural de Proteína

16.

GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software.

Mittal, Varun; Hung, Ling-Hong; Keswani, Jayant; Kristiyanto, Daniel; Lee, Sung Bong; Yeung, Ka Yee.

Gigascience ; 6(4): 1-6, 2017 04 01.

Artículo en Inglés | MEDLINE | ID: mdl-28327936

RESUMEN

Background: Software container technology such as Docker can be used to package and distribute bioinformatics workflows consisting of multiple software implementations and dependencies. However, Docker is a command line-based tool, and many bioinformatics pipelines consist of components that require a graphical user interface. Results: We present a container tool called GUIdock-VNC that uses a graphical desktop sharing system to provide a browser-based interface for containerized software. GUIdock-VNC uses the Virtual Network Computing protocol to render the graphics within most commonly used browsers. We also present a minimal image builder that can add our proposed graphical desktop sharing system to any Docker packages, with the end result that any Docker packages can be run using a graphical desktop within a browser. In addition, GUIdock-VNC uses the Oauth2 authentication protocols when deployed on the cloud. Conclusions: As a proof-of-concept, we demonstrated the utility of GUIdock-noVNC in gene network inference. We benchmarked our container implementation on various operating systems and showed that our solution creates minimal overhead.

Asunto(s)

Biología Computacional/métodos , Programas Informáticos , Interfaz Usuario-Computador , Navegador Web , Redes Reguladoras de Genes , Biología de Sistemas/métodos

17.

fastBMA: scalable network inference and transitive reduction.

Hung, Ling-Hong; Shi, Kaiyuan; Wu, Migao; Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee.

Gigascience ; 6(10): 1-10, 2017 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-29020744

RESUMEN

Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).

Asunto(s)

Algoritmos , Redes Reguladoras de Genes , Genoma Fúngico , Genoma Humano , Teorema de Bayes , Expresión Génica , Humanos , Modelos Estadísticos , Saccharomyces cerevisiae

18.

PROTINFO: Secondary and tertiary protein structure prediction.

Hung, Ling-Hong; Samudrala, Ram.

Nucleic Acids Res ; 31(13): 3296-9, 2003 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-12824311

RESUMEN

Information about the secondary and tertiary structure of a protein sequence can greatly assist biologists in the generation and testing of hypotheses, as well as design of experiments. The PROTINFO server enables users to submit a protein sequence and request a prediction of the three-dimensional (tertiary) structure based on comparative modeling, fold generation and de novo methods developed by the authors. In addition, users can submit NMR chemical shift data and request protein secondary structure assignment that is based on using neural networks to combine the chemical shifts with secondary structure predictions. The server is available at http://protinfo.compbio.washington.edu.

Asunto(s)

Modelos Moleculares , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína/métodos , Internet , Pliegue de Proteína , Proteínas/química , Programas Informáticos , Homología Estructural de Proteína , Factores de Tiempo , Interfaz Usuario-Computador

19.

GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research.

Hung, Ling-Hong; Kristiyanto, Daniel; Lee, Sung Bong; Yeung, Ka Yee.

PLoS One ; 11(4): e0152686, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27045593

RESUMEN

Reproducibility is vital in science. For complex computational methods, it is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results. Virtual machines, and container software such as Docker, make it possible to reproduce the exact environment regardless of the underlying hardware and operating system. However, workflows that use Graphical User Interfaces (GUIs) remain difficult to replicate on different host systems as there is no high level graphical software layer common to all platforms. GUIdock allows for the facile distribution of a systems biology application along with its graphics environment. Complex graphics based workflows, ubiquitous in systems biology, can now be easily exported and reproduced on many different platforms. GUIdock uses Docker, an open source project that provides a container with only the absolutely necessary software dependencies and configures a common X Windows (X11) graphic interface on Linux, Macintosh and Windows platforms. As proof of concept, we present a Docker package that contains a Bioconductor application written in R and C++ called networkBMA for gene network inference. Our package also includes Cytoscape, a java-based platform with a graphical user interface for visualizing and analyzing gene networks, and the CyNetworkBMA app, a Cytoscape app that allows the use of networkBMA via the user-friendly Cytoscape interface.

Asunto(s)

Lenguajes de Programación , Diseño de Software , Interfaz Usuario-Computador

20.

Accurate and automated classification of protein secondary structure with PsiCSI.

Hung, Ling-Hong; Samudrala, Ram.

Protein Sci ; 12(2): 288-95, 2003 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-12538892

RESUMEN

PsiCSI is a highly accurate and automated method of assigning secondary structure from NMR data, which is a useful intermediate step in the determination of tertiary structures. The method combines information from chemical shifts and protein sequence using three layers of neural networks. Training and testing was performed on a suite of 92 proteins (9437 residues) with known secondary and tertiary structure. Using a stringent cross-validation procedure in which the target and homologous proteins were removed from the databases used for training the neural networks, an average 89% Q3 accuracy (per residue) was observed. This is an increase of 6.2% and 5.5% (representing 36% and 33% fewer errors) over methods that use chemical shifts (CSI) or sequence information (Psipred) alone. In addition, PsiCSI improves upon the translation of chemical shift information to secondary structure (Q3 = 87.4%) and is able to use sequence information as an effective substitute for sparse NMR data (Q3 = 86.9% without (13)C shifts and Q3 = 86.8% with only H(alpha) shifts available). Finally, errors made by PsiCSI almost exclusively involve the interchange of helix or strand with coil and not helix with strand (<2.5 occurrences per 10000 residues). The automation, increased accuracy, absence of gross errors, and robustness with regards to sparse data make PsiCSI ideal for high-throughput applications, and should improve the effectiveness of hybrid NMR/de novo structure determination methods. A Web server is available for users to submit data and have the assignment returned.

Asunto(s)

Resonancia Magnética Nuclear Biomolecular/métodos , Estructura Secundaria de Proteína , Proteínas/química , Automatización/métodos , Simulación por Computador , Bases de Datos Factuales , Modelos Moleculares , Redes Neurales de la Computación , Reproducibilidad de los Resultados , Proyectos de Investigación , Sensibilidad y Especificidad

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA