Pesquisa | BVS IEC

Challenges and recommendations to improve the installability and archival stability of omics computational tools.

Mangul, Serghei; Mosqueiro, Thiago; Abdill, Richard J; Duong, Dat; Mitchell, Keith; Sarwal, Varuni; Hill, Brian; Brito, Jaqueline; Littman, Russell Jared; Statz, Benjamin; Lam, Angela Ka-Mei; Dayama, Gargi; Grieneisen, Laura; Martin, Lana S; Flint, Jonathan; Eskin, Eleazar; Blekhman, Ran.

PLoS Biol ; 17(6): e3000333, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31220077

RESUMO

Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed "easy to install," and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.

Assuntos

Biologia Computacional/métodos , Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação/métodos , Pesquisa Biomédica , Bases de Dados Factuais , Humanos , Internet , Software/tendências

Population structure in genetic studies: Confounding factors and mixed models.

Sul, Jae Hoon; Martin, Lana S; Eskin, Eleazar.

PLoS Genet ; 14(12): e1007309, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-30589851

RESUMO

A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to accurately test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.

Assuntos

Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Animais , Viés , Doença/genética , Feminino , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Modelos Lineares , Masculino , Camundongos , Modelos Estatísticos , Linhagem , Fenótipo , Filogenia , Polimorfismo de Nucleotídeo Único

Improving the usability and comprehensiveness of microbial databases.

Loeffler, Caitlin; Karlsberg, Aaron; Martin, Lana S; Eskin, Eleazar; Koslicki, David; Mangul, Serghei.

BMC Biol ; 18(1): 37, 2020 04 07.

Artigo em Inglês | MEDLINE | ID: mdl-32264902

RESUMO

Metagenomics studies leverage genomic reference databases to generate discoveries in basic science and translational research. However, current microbial studies use disparate reference databases that lack consistent standards of specimen inclusion, data preparation, taxon labelling and accessibility, hindering their quality and comprehensiveness, and calling for the establishment of recommendations for reference genome database assembly. Here, we analyze existing fungal and bacterial databases and discuss guidelines for the development of a master reference database that promises to improve the quality and quantity of omics research.

Assuntos

Bactérias/genética , Bases de Dados Genéticas/normas , Fungos/genética , Metagenômica/normas , Metagenômica/instrumentação

Telescope: an interactive tool for managing large-scale analysis from mobile devices.

Brito, Jaqueline J; Mosqueiro, Thiago; Rotman, Jeremy; Xue, Victor; Chapski, Douglas J; la Hoz, Juan De; Matias, Paulo; Martin, Lana S; Zelikovsky, Alex; Pellegrini, Matteo; Mangul, Serghei.

Gigascience ; 9(1)2020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31972019

RESUMO

BACKGROUND: In today's world of big data, computational analysis has become a key driver of biomedical research. High-performance computational facilities are capable of processing considerable volumes of data, yet often lack an easy-to-use interface to guide the user in supervising and adjusting bioinformatics analysis via a tablet or smartphone. RESULTS: To address this gap we proposed Telescope, a novel tool that interfaces with high-performance computational clusters to deliver an intuitive user interface for controlling and monitoring bioinformatics analyses in real-time. By leveraging last generation technology now ubiquitous to most researchers (such as smartphones), Telescope delivers a friendly user experience and manages conectivity and encryption under the hood. CONCLUSIONS: Telescope helps to mitigate the digital divide between wet and computational laboratories in contemporary biology. By delivering convenience and ease of use through a user experience not relying on expertise with computational clusters, Telescope can help researchers close the feedback loop between bioinformatics and experimental work with minimal impact on the performance of computational tools. Telescope is freely available at https://github.com/Mangul-Lab-USC/telescope.

Assuntos

Biologia Computacional/métodos , Mineração de Dados/métodos , Software , Big Data , Interface Usuário-Computador

Benchmarking of computational error-correction methods for next-generation sequencing data.

Mitchell, Keith; Brito, Jaqueline J; Mandric, Igor; Wu, Qiaozhen; Knyazev, Sergey; Chang, Sei; Martin, Lana S; Karlsberg, Aaron; Gerasimov, Ekaterina; Littman, Russell; Hill, Brian L; Wu, Nicholas C; Yang, Harry Taegyun; Hsieh, Kevin; Chen, Linus; Littman, Eli; Shabani, Taylor; Enik, German; Yao, Douglas; Sun, Ren; Schroeder, Jan; Eskin, Eleazar; Zelikovsky, Alex; Skums, Pavel; Pop, Mihai; Mangul, Serghei.

Genome Biol ; 21(1): 71, 2020 03 17.

Artigo em Inglês | MEDLINE | ID: mdl-32183840

RESUMO

BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Benchmarking , Biologia Computacional/métodos , Humanos , Receptores de Antígenos de Linfócitos T/genética , Vírus/genética , Sequenciamento Completo do Genoma

Improving the usability and archival stability of bioinformatics software.

Mangul, Serghei; Martin, Lana S; Eskin, Eleazar; Blekhman, Ran.

Genome Biol ; 20(1): 47, 2019 02 27.

Artigo em Inglês | MEDLINE | ID: mdl-30813962

RESUMO

Implementation of bioinformatics software involves numerous unique challenges; a rigorous standardized approach is needed to examine software tools prior to their publication.

Assuntos

Biologia Computacional/normas , Software/normas , Arquivos

Systematic benchmarking of omics computational tools.

Mangul, Serghei; Martin, Lana S; Hill, Brian L; Lam, Angela Ka-Mei; Distler, Margaret G; Zelikovsky, Alex; Eskin, Eleazar; Flint, Jonathan.

Nat Commun ; 10(1): 1393, 2019 03 27.

Artigo em Inglês | MEDLINE | ID: mdl-30918265

RESUMO

Computational omics methods packaged as software have become essential to modern biological research. The increasing dependence of scientists on these powerful software tools creates a need for systematic assessment of these methods, known as benchmarking. Adopting a standardized benchmarking practice could help researchers who use omics data to better leverage recent technological innovations. Our review summarizes benchmarking practices from 25 recent studies and discusses the challenges, advantages, and limitations of benchmarking across various domains of biology. We also propose principles that can make computational biology benchmarking studies more sustainable and reproducible, ultimately increasing the transparency of biomedical data and results.

Assuntos

Benchmarking , Biologia Computacional , Genômica , Software , Humanos , Metabolômica

Addressing the Digital Divide in Contemporary Biology: Lessons from Teaching UNIX.

Mangul, Serghei; Martin, Lana S; Hoffmann, Alexander; Pellegrini, Matteo; Eskin, Eleazar.

Trends Biotechnol ; 35(10): 901-903, 2017 10.

Artigo em Inglês | MEDLINE | ID: mdl-28720283

RESUMO

Life and medical science researchers increasingly rely on applications that lack a graphical interface. Scientists who are not trained in computer science face an enormous challenge analyzing high-throughput data. We present a training model for use of command-line tools when the learner has little to no prior knowledge of UNIX.

Assuntos

Alfabetização Digital , Processamento Eletrônico de Dados , Genômica/educação , Linguagens de Programação , Interface Usuário-Computador , Genômica/métodos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA