Búsqueda | Portal de Búsqueda de la BVS

Challenges and recommendations to improve the installability and archival stability of omics computational tools.

Mangul, Serghei; Mosqueiro, Thiago; Abdill, Richard J; Duong, Dat; Mitchell, Keith; Sarwal, Varuni; Hill, Brian; Brito, Jaqueline; Littman, Russell Jared; Statz, Benjamin; Lam, Angela Ka-Mei; Dayama, Gargi; Grieneisen, Laura; Martin, Lana S; Flint, Jonathan; Eskin, Eleazar; Blekhman, Ran.

PLoS Biol ; 17(6): e3000333, 2019 06.

Artículo en Inglés | MEDLINE | ID: mdl-31220077

RESUMEN

Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed "easy to install," and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.

Asunto(s)

Biología Computacional/métodos , Difusión de la Información/métodos , Almacenamiento y Recuperación de la Información/métodos , Investigación Biomédica , Bases de Datos Factuales , Humanos , Internet , Programas Informáticos/tendencias

Population structure in genetic studies: Confounding factors and mixed models.

Sul, Jae Hoon; Martin, Lana S; Eskin, Eleazar.

PLoS Genet ; 14(12): e1007309, 2018 12.

Artículo en Inglés | MEDLINE | ID: mdl-30589851

RESUMEN

A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to accurately test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.

Asunto(s)

Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Animales , Sesgo , Enfermedad/genética , Femenino , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Modelos Lineales , Masculino , Ratones , Modelos Estadísticos , Linaje , Fenotipo , Filogenia , Polimorfismo de Nucleótido Simple

Improving the usability and comprehensiveness of microbial databases.

Loeffler, Caitlin; Karlsberg, Aaron; Martin, Lana S; Eskin, Eleazar; Koslicki, David; Mangul, Serghei.

BMC Biol ; 18(1): 37, 2020 04 07.

Artículo en Inglés | MEDLINE | ID: mdl-32264902

RESUMEN

Metagenomics studies leverage genomic reference databases to generate discoveries in basic science and translational research. However, current microbial studies use disparate reference databases that lack consistent standards of specimen inclusion, data preparation, taxon labelling and accessibility, hindering their quality and comprehensiveness, and calling for the establishment of recommendations for reference genome database assembly. Here, we analyze existing fungal and bacterial databases and discuss guidelines for the development of a master reference database that promises to improve the quality and quantity of omics research.

Asunto(s)

Bacterias/genética , Bases de Datos Genéticas/normas , Hongos/genética , Metagenómica/normas , Metagenómica/instrumentación

Telescope: an interactive tool for managing large-scale analysis from mobile devices.

Brito, Jaqueline J; Mosqueiro, Thiago; Rotman, Jeremy; Xue, Victor; Chapski, Douglas J; la Hoz, Juan De; Matias, Paulo; Martin, Lana S; Zelikovsky, Alex; Pellegrini, Matteo; Mangul, Serghei.

Gigascience ; 9(1)2020 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-31972019

RESUMEN

BACKGROUND: In today's world of big data, computational analysis has become a key driver of biomedical research. High-performance computational facilities are capable of processing considerable volumes of data, yet often lack an easy-to-use interface to guide the user in supervising and adjusting bioinformatics analysis via a tablet or smartphone. RESULTS: To address this gap we proposed Telescope, a novel tool that interfaces with high-performance computational clusters to deliver an intuitive user interface for controlling and monitoring bioinformatics analyses in real-time. By leveraging last generation technology now ubiquitous to most researchers (such as smartphones), Telescope delivers a friendly user experience and manages conectivity and encryption under the hood. CONCLUSIONS: Telescope helps to mitigate the digital divide between wet and computational laboratories in contemporary biology. By delivering convenience and ease of use through a user experience not relying on expertise with computational clusters, Telescope can help researchers close the feedback loop between bioinformatics and experimental work with minimal impact on the performance of computational tools. Telescope is freely available at https://github.com/Mangul-Lab-USC/telescope.

Asunto(s)

Biología Computacional/métodos , Minería de Datos/métodos , Programas Informáticos , Macrodatos , Interfaz Usuario-Computador

Benchmarking of computational error-correction methods for next-generation sequencing data.

Mitchell, Keith; Brito, Jaqueline J; Mandric, Igor; Wu, Qiaozhen; Knyazev, Sergey; Chang, Sei; Martin, Lana S; Karlsberg, Aaron; Gerasimov, Ekaterina; Littman, Russell; Hill, Brian L; Wu, Nicholas C; Yang, Harry Taegyun; Hsieh, Kevin; Chen, Linus; Littman, Eli; Shabani, Taylor; Enik, German; Yao, Douglas; Sun, Ren; Schroeder, Jan; Eskin, Eleazar; Zelikovsky, Alex; Skums, Pavel; Pop, Mihai; Mangul, Serghei.

Genome Biol ; 21(1): 71, 2020 03 17.

Artículo en Inglés | MEDLINE | ID: mdl-32183840

RESUMEN

BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.

Asunto(s)

Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Benchmarking , Biología Computacional/métodos , Humanos , Receptores de Antígenos de Linfocitos T/genética , Virus/genética , Secuenciación Completa del Genoma

Improving the usability and archival stability of bioinformatics software.

Mangul, Serghei; Martin, Lana S; Eskin, Eleazar; Blekhman, Ran.

Genome Biol ; 20(1): 47, 2019 02 27.

Artículo en Inglés | MEDLINE | ID: mdl-30813962

RESUMEN

Implementation of bioinformatics software involves numerous unique challenges; a rigorous standardized approach is needed to examine software tools prior to their publication.

Asunto(s)

Biología Computacional/normas , Programas Informáticos/normas , Archivos

Systematic benchmarking of omics computational tools.

Mangul, Serghei; Martin, Lana S; Hill, Brian L; Lam, Angela Ka-Mei; Distler, Margaret G; Zelikovsky, Alex; Eskin, Eleazar; Flint, Jonathan.

Nat Commun ; 10(1): 1393, 2019 03 27.

Artículo en Inglés | MEDLINE | ID: mdl-30918265

RESUMEN

Computational omics methods packaged as software have become essential to modern biological research. The increasing dependence of scientists on these powerful software tools creates a need for systematic assessment of these methods, known as benchmarking. Adopting a standardized benchmarking practice could help researchers who use omics data to better leverage recent technological innovations. Our review summarizes benchmarking practices from 25 recent studies and discusses the challenges, advantages, and limitations of benchmarking across various domains of biology. We also propose principles that can make computational biology benchmarking studies more sustainable and reproducible, ultimately increasing the transparency of biomedical data and results.

Asunto(s)

Benchmarking , Biología Computacional , Genómica , Programas Informáticos , Humanos , Metabolómica

Addressing the Digital Divide in Contemporary Biology: Lessons from Teaching UNIX.

Mangul, Serghei; Martin, Lana S; Hoffmann, Alexander; Pellegrini, Matteo; Eskin, Eleazar.

Trends Biotechnol ; 35(10): 901-903, 2017 10.

Artículo en Inglés | MEDLINE | ID: mdl-28720283

RESUMEN

Life and medical science researchers increasingly rely on applications that lack a graphical interface. Scientists who are not trained in computer science face an enormous challenge analyzing high-throughput data. We present a training model for use of command-line tools when the learner has little to no prior knowledge of UNIX.

Asunto(s)

Alfabetización Digital , Procesamiento Automatizado de Datos , Genómica/educación , Lenguajes de Programación , Interfaz Usuario-Computador , Genómica/métodos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA