Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 39(39 Suppl 1): i11-i20, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387150

RESUMO

MOTIVATION: The reproducibility crisis has highlighted the importance of improving the way bioinformatics data analyses are implemented, executed, and shared. To address this, various tools such as content versioning systems, workflow management systems, and software environment management systems have been developed. While these tools are becoming more widely used, there is still much work to be done to increase their adoption. The most effective way to ensure reproducibility becomes a standard part of most bioinformatics data analysis projects is to integrate it into the curriculum of bioinformatics Master's programs. RESULTS: In this article, we present the Reprohackathon, a Master's course that we have been running for the last 3 years at Université Paris-Saclay (France), and that has been attended by a total of 123 students. The course is divided into two parts. The first part includes lessons on the challenges related to reproducibility, content versioning systems, container management, and workflow systems. In the second part, students work on a data analysis project for 3-4 months, reanalyzing data from a previously published study. The Reprohackaton has taught us many valuable lessons, such as the fact that implementing reproducible analyses is a complex and challenging task that requires significant effort. However, providing in-depth teaching of the concepts and the tools during a Master's degree program greatly improves students' understanding and abilities in this area.


Assuntos
Biologia Computacional , Currículo , Humanos , Reprodutibilidade dos Testes , Análise de Dados , Software
2.
Comput Struct Biotechnol J ; 21: 2075-2085, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36968012

RESUMO

Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.

3.
J Clin Epidemiol ; 149: 36-44, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35636590

RESUMO

OBJECTIVES: To visualize the evolution of all registered COVID-19 vaccine trials. STUDY DESIGN AND SETTING: As part of the living mapping of the COVID-NMA initiative, we identify biweekly all COVID-19 vaccine trials and automatically extract data from the EU clinical trials registry, ClinicalTrials.gov, IRCT and the World Health Organization International Clinical Trials Registry Platform. Data are curated and enriched by epidemiologists. We have used the phylomemy reconstruction process to visualize the temporal evolution of COVID-19 vaccines trials descriptions. We have analyzed the textual contents of 1,794 trials descriptions (last search in October 2021) and explored their collective structure along with their semantic dynamics. RESULTS: The structures highlighted by the phylomemy reconstruction processes synthesize the complexity of the knowledge produced by the research community. The reconstructed phylomemy clearly retrieves the five major COVID-19 vaccine platforms in the form of complete branches. The branches interactions reflect the exploration of a new approach to vaccine implementation moving from homologous prime vaccination to heterologous prime vaccination. Phylomemies also clearly identifies shifts in research questions, from vaccine efficacy to booster efficacy. CONCLUSION: This new method provides important insights for the global coordination between research teams especially in crisis situations such as the COVID-19 pandemic.


Assuntos
Vacinas contra COVID-19 , COVID-19 , Humanos , COVID-19/epidemiologia , COVID-19/prevenção & controle , Vacinas contra COVID-19/uso terapêutico , Pandemias/prevenção & controle , SARS-CoV-2 , Vacinação/métodos , Ensaios Clínicos como Assunto
4.
J Clin Epidemiol ; 130: 107-116, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33096223

RESUMO

OBJECTIVES: Researchers worldwide are actively engaging in research activities to search for preventive and therapeutic interventions against coronavirus disease 2019 (COVID-19). Our aim was to describe the planning of randomized controlled trials (RCTs) in terms of timing related to the course of the COVID-19 epidemic and research question evaluated. STUDY DESIGN AND SETTING: We performed a living mapping of RCTs registered in the WHO International Clinical Trials Registry Platform. We systematically search the platform every week for all RCTs evaluating preventive interventions and treatments for COVID-19 and created a publicly available interactive mapping tool at https://covid-nma.com to visualize all trials registered. RESULTS: By August 12, 2020, 1,568 trials for COVID-19 were registered worldwide. Overall, the median ([Q1-Q3]; range) delay between the first case recorded in each country and the first RCT registered was 47 days ([33-67]; 15-163). For the 9 countries with the highest number of trials registered, most trials were registered after the peak of the epidemic (from 100% trials in Italy to 38% in the United States). Most trials evaluated treatments (1,333 trials; 85%); only 223 (14%) evaluated preventive strategies and 12 postacute period intervention. A total of 254 trials were planned to assess different regimens of hydroxychloroquine with an expected sample size of 110,883 patients. CONCLUSION: This living mapping analysis showed that COVID-19 trials have relatively small sample size with certain redundancy in research questions. Most trials were registered when the first peak of the pandemic has passed.


Assuntos
Tratamento Farmacológico da COVID-19 , Hidroxicloroquina/uso terapêutico , Pandemias/prevenção & controle , COVID-19/prevenção & controle , Projetos de Pesquisa Epidemiológica , Feminino , Mapeamento Geográfico , Humanos , Internet , Itália , Masculino , Ensaios Clínicos Controlados Aleatórios como Assunto , Tamanho da Amostra , Estados Unidos
5.
Nucleic Acids Res ; 47(W1): W260-W265, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31028399

RESUMO

Phylogeny.fr, created in 2008, has been designed to facilitate the execution of phylogenetic workflows, and is nowadays widely used. However, since its development, user needs have evolved, new tools and workflows have been published, and the number of jobs has increased dramatically, thus promoting new practices, which motivated its refactoring. We developed NGPhylogeny.fr to be more flexible in terms of tools and workflows, easily installable, and more scalable. It integrates numerous tools in their latest version (e.g. TNT, FastME, MrBayes, etc.) as well as new ones designed in the last ten years (e.g. PhyML, SMS, FastTree, trimAl, BOOSTER, etc.). These tools cover a large range of usage (sequence searching, multiple sequence alignment, model selection, tree inference and tree drawing) and a large panel of standard methods (distance, parsimony, maximum likelihood and Bayesian). They are integrated in workflows, which have been already configured ('One click'), can be customized ('Advanced'), or are built from scratch ('A la carte'). Workflows are managed and run by an underlying Galaxy workflow system, which makes workflows more scalable in terms of number of jobs and size of data. NGPhylogeny.fr is deployable on any server or personal computer, and is freely accessible at https://ngphylogeny.fr.


Assuntos
Bases de Dados Factuais , Internet , Filogenia , Software
6.
Front Physiol ; 9: 680, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29971009

RESUMO

Analysing models of biological networks typically relies on workflows in which different software tools with sensitive parameters are chained together, many times with additional manual steps. The accessibility and reproducibility of such workflows is challenging, as publications often overlook analysis details, and because some of these tools may be difficult to install, and/or have a steep learning curve. The CoLoMoTo Interactive Notebook provides a unified environment to edit, execute, share, and reproduce analyses of qualitative models of biological networks. This framework combines the power of different technologies to ensure repeatability and to reduce users' learning curve of these technologies. The framework is distributed as a Docker image with the tools ready to be run without any installation step besides Docker, and is available on Linux, macOS, and Microsoft Windows. The embedded computational workflows are edited with a Jupyter web interface, enabling the inclusion of textual annotations, along with the explicit code to execute, as well as the visualization of the results. The resulting notebook files can then be shared and re-executed in the same environment. To date, the CoLoMoTo Interactive Notebook provides access to the software tools GINsim, BioLQM, Pint, MaBoSS, and Cell Collective, for the modeling and analysis of Boolean and multi-valued networks. More tools will be included in the future. We developed a Python interface for each of these tools to offer a seamless integration in the Jupyter web interface and ease the chaining of complementary analyses.

7.
BMC Bioinformatics ; 15 Suppl 1: S12, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24564760

RESUMO

BACKGROUND: Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who can use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent bioinformatics tasks and links represent the dataflow. The complexity of such graph structures is increasing over time, with possible impacts on scientific workflows reuse. In this work, we propose effective methods for workflow design, with a focus on the Taverna model. We argue that one of the contributing factors for the difficulties in reuse is the presence of "anti-patterns", a term broadly used in program design, to indicate the use of idiomatic forms that lead to over-complicated design. The main contribution of this work is a method for automatically detecting such anti-patterns, and replacing them with different patterns which result in a reduction in the workflow's overall structural complexity. Rewriting workflows in this way will be beneficial both in terms of user experience (easier design and maintenance), and in terms of operational efficiency (easier to manage, and sometimes to exploit the latent parallelism amongst the tasks). RESULTS: We have conducted a thorough study of the workflows structures available in Taverna, with the aim of finding out workflow fragments whose structure could be made simpler without altering the workflow semantics. We provide four contributions. Firstly, we identify a set of anti-patterns that contribute to the structural workflow complexity. Secondly, we design a series of refactoring transformations to replace each anti-pattern by a new semantically-equivalent pattern with less redundancy and simplified structure. Thirdly, we introduce a distilling algorithm that takes in a workflow and produces a distilled semantically-equivalent workflow. Lastly, we provide an implementation of our refactoring approach that we evaluate on both the public Taverna workflows and on a private collection of workflows from the BioVel project. CONCLUSION: We have designed and implemented an approach to improving workflow structure by way of rewriting preserving workflow semantics. Future work includes considering our refactoring approach during the phase of workflow design and proposing guidelines for designing distilled workflows.


Assuntos
Algoritmos , Interface Usuário-Computador , Fluxo de Trabalho , Biologia Computacional/métodos
8.
Bioinformatics ; 27(8): 1187-9, 2011 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-21349868

RESUMO

MOTIVATION: High-throughput technologies provide fundamental informations concerning thousands of genes. Many of the current research laboratories daily use one or more of these technologies and end-up with lists of genes. Assessing the originality of the results obtained includes being aware of the number of publications available concerning individual or multiple genes and accessing information about these publications. Faced with the exponential growth of publications avaliable and number of genes involved in a study, this task is becoming particularly difficult to achieve. RESULTS: We introduce GeneValorization, a web-based tool that gives a clear and handful overview of the bibliography available corresponding to the user input formed by (i) a gene list (expressed by gene names or ids from EntrezGene) and (ii) a context of study (expressed by keywords). From this input, GeneValorization provides a matrix containing the number of publications with co-occurrences of gene names and keywords. Graphics are automatically generated to assess the relative importance of genes within various contexts. Links to publications and other databases offering information on genes and keywords are also available. To illustrate how helpful GeneValorization is, we will consider the gene list of the OncotypeDX prognostic marker test. AVAILABILITY: http://bioguide-project.net/gv CONTACT: cohen@lri.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genes , Software , Genes Neoplásicos , Humanos , Internet , Publicações , Interface Usuário-Computador
10.
Bioinformatics ; 23(10): 1301-3, 2007 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-17344233

RESUMO

UNLABELLED: Biologists are frequently faced with the problem of integrating information from multiple heterogeneous sources with their own experimental data. Given the large number of public sources, it is difficult to choose which sources to integrate without assistance. When doing this manually, biologists differ in their preferences concerning the sources to be queried as well as the strategies, i.e. the querying process they follow for navigating through the sources. In response to these findings, we have developed BioGuide to assist scientists search for relevant data within external sources while taking their preferences and strategies into account. In this article, we present BioGuideSRS, a user-friendly system which automatically retrieves instances of data by using BioGuide on top of the sequence retrieval system (SRS). BioGuideSRS is an Applet that can be run from its web page on any system with Java 5.0. AVAILABILITY: http://www.bioguide-project.net.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação , Software , Interface Usuário-Computador , Bases de Dados Genéticas , Internet
11.
J Bioinform Comput Biol ; 4(5): 1069-95, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17099942

RESUMO

Fueled by novel technologies capable of producing massive amounts of data for a single experiment, scientists are faced with an explosion of information which must be rapidly analyzed and combined with other data to form hypotheses and create knowledge. Today, numerous biological questions can be answered without entering a wet lab. Scientific protocols designed to answer these questions can be run entirely on a computer. Biological resources are often complementary, focused on different objects and reflecting various experts' points of view. Exploiting the richness and diversity of these resources is crucial for scientists. However, with the increase of resources, scientists have to face the problem of selecting sources and tools when interpreting their data. In this paper, we analyze the way in which biologists express and implement scientific protocols, and we identify the requirements for a system which can guide scientists in constructing protocols to answer new biological questions. We present two such systems, BioNavigation and BioGuide dedicated to help scientists select resources by following suitable paths within the growing network of interconnected biological resources.


Assuntos
Fenômenos Fisiológicos Celulares , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Regulação da Expressão Gênica/fisiologia , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Transdução de Sinais/fisiologia , Projetos de Pesquisa , Ciência/métodos , Software
12.
Pac Symp Biocomput ; : 116-27, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17094233

RESUMO

As the number, richness and diversity of biological sources grow, scientists are increasingly confronted with the problem of selecting appropriate sources and tools. To address this problem, we have designed BioGuidel, a user-centric framework that helps scientists choose sources and tools according to their preferences and strategy, by specifying queries through a user-friendly visual interface. In this paper, we provide a complete RDF representation of BioGuide and introduce XPR (eXtensible Path language for RDF), an extension of FSL2 that is expressive enough to model all BioGuide queries. BioGuide queries modeled as XPR expressions can then be saved, compared, evaluated and exchanged through the Web between users and applications.


Assuntos
Biologia Computacional , Bases de Dados Factuais , Simulação por Computador , Linguagens de Programação , Interface Usuário-Computador
13.
Proteomics ; 6(20): 5445-66, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16991192

RESUMO

The present review attempts to cover the most recent initiatives directed towards representing, storing, displaying and processing protein-related data suited to undertake "comparative proteomics" studies. Data interpretation is brought into focus. Efforts invested into analysing and interpreting experimental data increasingly express the need for adding meaning. This trend is perceptible in work dedicated to determining ontologies, modelling interaction networks, etc. In parallel, technical advances in computer science are spurred by the development of the Web and the growing need to channel and understand massive volumes of data. Biology benefits from these advances as an application of choice for many generic solutions. Some examples of bioinformatics solutions are discussed and directions for on-going and future work conclude the review.


Assuntos
Biologia Computacional/métodos , Proteômica/métodos , Algoritmos , Automação , Cromatografia Líquida , Biologia Computacional/instrumentação , Interpretação Estatística de Dados , Eletroforese em Gel Bidimensional/métodos , Processamento de Imagem Assistida por Computador , Internet , Espectrometria de Massas , Peptídeos/química , Linguagens de Programação , Proteômica/instrumentação , Controle de Qualidade , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...