Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 22(1): 178, 2021 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-33823788

RESUMO

BACKGROUND: The hallmarks of cancer provide a highly cited and well-used conceptual framework for describing the processes involved in cancer cell development and tumourigenesis. However, methods for translating these high-level concepts into data-level associations between hallmarks and genes (for high throughput analysis), vary widely between studies. The examination of different strategies to associate and map cancer hallmarks reveals significant differences, but also consensus. RESULTS: Here we present the results of a comparative analysis of cancer hallmark mapping strategies, based on Gene Ontology and biological pathway annotation, from different studies. By analysing the semantic similarity between annotations, and the resulting gene set overlap, we identify emerging consensus knowledge. In addition, we analyse the differences between hallmark and gene set associations using Weighted Gene Co-expression Network Analysis and enrichment analysis. CONCLUSIONS: Reaching a community-wide consensus on how to identify cancer hallmark activity from research data would enable more systematic data integration and comparison between studies. These results highlight the current state of the consensus and offer a starting point for further convergence. In addition, we show how a lack of consensus can lead to large differences in the biological interpretation of downstream analyses and discuss the challenges of annotating changing and accumulating biological data, using intermediate knowledge resources that are also changing over time.


Assuntos
Ontologia Genética , Neoplasias , Semântica , Consenso , Humanos , Anotação de Sequência Molecular , Neoplasias/diagnóstico , Neoplasias/genética
2.
PLoS Biol ; 15(6): e2001414, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28662064

RESUMO

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.


Assuntos
Disciplinas das Ciências Biológicas/métodos , Biologia Computacional/métodos , Mineração de Dados/métodos , Design de Software , Software , Disciplinas das Ciências Biológicas/estatística & dados numéricos , Disciplinas das Ciências Biológicas/tendências , Biologia Computacional/tendências , Mineração de Dados/estatística & dados numéricos , Mineração de Dados/tendências , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados Factuais/tendências , Previsões , Humanos , Internet
3.
Nucleic Acids Res ; 45(D1): D404-D407, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899646

RESUMO

The FAIRDOMHub is a repository for publishing FAIR (Findable, Accessible, Interoperable and Reusable) Data, Operating procedures and Models (https://fairdomhub.org/) for the Systems Biology community. It is a web-accessible repository for storing and sharing systems biology research assets. It enables researchers to organize, share and publish data, models and protocols, interlink them in the context of the systems biology investigations that produced them, and to interrogate them via API interfaces. By using the FAIRDOMHub, researchers can achieve more effective exchange with geographically distributed collaborators during projects, ensure results are sustained and preserved and generate reproducible publications that adhere to the FAIR guiding principles of data stewardship.


Assuntos
Bases de Dados Factuais , Biologia de Sistemas/métodos , Carbono/metabolismo , Curadoria de Dados , Disseminação de Informação , Redes e Vias Metabólicas , Pesquisa
5.
Nucleic Acids Res ; 41(Web Server issue): W557-61, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23640334

RESUMO

The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.


Assuntos
Biologia Computacional , Software , Mineração de Dados , Perfilação da Expressão Gênica , Internet , Filogenia , Proteômica , Ferramenta de Busca , Fluxo de Trabalho
6.
Orphanet J Rare Dis ; 18(1): 218, 2023 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-37501188

RESUMO

BACKGROUND: In biomedicine, machine learning (ML) has proven beneficial for the prognosis and diagnosis of different diseases, including cancer and neurodegenerative disorders. For rare diseases, however, the requirement for large datasets often prevents this approach. Huntington's disease (HD) is a rare neurodegenerative disorder caused by a CAG repeat expansion in the coding region of the huntingtin gene. The world's largest observational study for HD, Enroll-HD, describes over 21,000 participants. As such, Enroll-HD is amenable to ML methods. In this study, we pre-processed and imputed Enroll-HD with ML methods to maximise the inclusion of participants and variables. With this dataset we developed models to improve the prediction of the age at onset (AAO) and compared it to the well-established Langbehn formula. In addition, we used recurrent neural networks (RNNs) to demonstrate the utility of ML methods for longitudinal datasets, assessing driving capabilities by learning from previous participant assessments. RESULTS: Simple pre-processing imputed around 42% of missing values in Enroll-HD. Also, 167 variables were retained as a result of imputing with ML. We found that multiple ML models were able to outperform the Langbehn formula. The best ML model (light gradient boosting machine) improved the prognosis of AAO compared to the Langbehn formula by 9.2%, based on root mean squared error in the test set. In addition, our ML model provides more accurate prognosis for a wider CAG repeat range compared to the Langbehn formula. Driving capability was predicted with an accuracy of 85.2%. The resulting pre-processing workflow and code to train the ML models are available to be used for related HD predictions at: https://github.com/JasperO98/hdml/tree/main . CONCLUSIONS: Our pre-processing workflow made it possible to resolve the missing values and include most participants and variables in Enroll-HD. We show the added value of a ML approach, which improved AAO predictions and allowed for the development of an advisory model that can assist clinicians and participants in estimating future driving capability.


Assuntos
Doença de Huntington , Humanos , Doença de Huntington/diagnóstico , Doença de Huntington/genética , Prognóstico , Idade de Início , Aprendizado de Máquina
7.
F1000Res ; 112022.
Artigo em Inglês | MEDLINE | ID: mdl-36742342

RESUMO

In this white paper, we describe the founding of a new ELIXIR Community - the Systems Biology Community - and its proposed future contributions to both ELIXIR and the broader community of systems biologists in Europe and worldwide. The Community believes that the infrastructure aspects of systems biology - databases, (modelling) tools and standards development, as well as training and access to cloud infrastructure - are not only appropriate components of the ELIXIR infrastructure, but will prove key components of ELIXIR's future support of advanced biological applications and personalised medicine. By way of a series of meetings, the Community identified seven key areas for its future activities, reflecting both future needs and previous and current activities within ELIXIR Platforms and Communities. These are: overcoming barriers to the wider uptake of systems biology; linking new and existing data to systems biology models; interoperability of systems biology resources; further development and embedding of systems medicine; provisioning of modelling as a service; building and coordinating capacity building and training resources; and supporting industrial embedding of systems biology. A set of objectives for the Community has been identified under four main headline areas: Standardisation and Interoperability, Technology, Capacity Building and Training, and Industrial Embedding. These are grouped into short-term (3-year), mid-term (6-year) and long-term (10-year) objectives.


Assuntos
Biologia de Sistemas , Europa (Continente) , Bases de Dados Factuais
9.
F1000Res ; 10: 897, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34804501

RESUMO

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.


Assuntos
Disciplinas das Ciências Biológicas , Biologia Computacional , Benchmarking , Software , Fluxo de Trabalho
10.
Nucleic Acids Res ; 35(16): 5625-33, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17709344

RESUMO

It is increasingly common to combine Microarray and Quantitative Trait Loci data to aid the search for candidate genes responsible for phenotypic variation. Workflows provide a means of systematically processing these large datasets and also represent a framework for the re-use and the explicit declaration of experimental methods. In this article, we highlight the issues facing the manual analysis of microarray and QTL data for the discovery of candidate genes underlying complex phenotypes. We show how automated approaches provide a systematic means to investigate genotype-phenotype correlations. This methodology was applied to a use case of resistance to African trypanosomiasis in the mouse. Pathways represented in the results identified Daxx as one of the candidate genes within the Tir1 QTL region. Subsequent re-sequencing in Daxx identified a deletion of an amino acid, identified in susceptible mouse strains, in the Daxx-p53 protein-binding region. This supports recent experimental evidence that apoptosis could be playing a role in the trypanosomiasis resistance phenotype. Workflows developed in this investigation, including a guide to loading and executing them with example data, are available at http://workflows.mygrid.org.uk/repository/myGrid/PaulFisher/.


Assuntos
Perfilação da Expressão Gênica , Predisposição Genética para Doença , Locos de Características Quantitativas , Tripanossomíase Africana/genética , Animais , Sequência de Bases , Proteínas de Transporte/genética , Proteínas Correpressoras , Genótipo , Imunidade Inata/genética , Peptídeos e Proteínas de Sinalização Intracelular/genética , Camundongos , Chaperonas Moleculares , Dados de Sequência Molecular , Proteínas Nucleares/genética , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Alinhamento de Sequência , Software , Tripanossomíase Africana/metabolismo
12.
Interface Focus ; 6(2): 20150094, 2016 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-27051513

RESUMO

The goal of developing therapies and dosage regimes for characterized subgroups of the general population can be facilitated by the use of simulation models able to incorporate information about inter-individual variability in drug disposition (pharmacokinetics), toxicity and response effect (pharmacodynamics). Such observed variability can have multiple causes at various scales, ranging from gross anatomical differences to differences in genome sequence. Relevant data for many of these aspects, particularly related to molecular assays (known as '-omics'), are available in online resources, but identification and assignment to appropriate model variables and parameters is a significant bottleneck in the model development process. Through its efforts to standardize annotation with consequent increase in data usability, the human physiome project has a vital role in improving productivity in model development and, thus, the development of personalized therapy regimes. Here, we review the current status of personalized medicine in clinical practice, outline some of the challenges that must be overcome in order to expand its applicability, and discuss the relevance of personalized medicine to the more widespread challenges being faced in drug discovery and development. We then review some of (i) the key data resources available for use in model development and (ii) the potential areas where advances made within the physiome modelling community could contribute to physiologically based pharmacokinetic and physiologically based pharmacokinetic/pharmacodynamic modelling in support of personalized drug development. We conclude by proposing a roadmap to further guide the physiome community in its on-going efforts to improve data usability, and integration with modelling efforts in the support of personalized medicine development.

13.
Sci Rep ; 6: 19386, 2016 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-26783251

RESUMO

In vitro cardiac differentiation of human pluripotent stem cells (hPSCs) closely recapitulates in vivo embryonic heart development, and therefore, provides an excellent model to study human cardiac development. We recently generated the dual cardiac fluorescent reporter MESP1(mCherry/w)NKX2-5(eGFP/w) line in human embryonic stem cells (hESCs), allowing the visualization of pre-cardiac MESP1+ mesoderm and their further commitment towards the cardiac lineage, marked by activation of the cardiac transcription factor NKX2-5. Here, we performed a comprehensive whole genome based transcriptome analysis of MESP1-mCherry derived cardiac-committed cells. In addition to previously described cardiac-inducing signalling pathways, we identified novel transcriptional and signalling networks indicated by transient activation and interactive network analysis. Furthermore, we found a highly dynamic regulation of extracellular matrix components, suggesting the importance to create a versatile niche, adjusting to various stages of cardiac differentiation. Finally, we identified cell surface markers for cardiac progenitors, such as the Leucine-rich repeat-containing G-protein coupled receptor 4 (LGR4), belonging to the same subfamily of LGR5, and LGR6, established tissue/cancer stem cells markers. We provide a comprehensive gene expression analysis of cardiac derivatives from pre-cardiac MESP1-progenitors that will contribute to a better understanding of the key regulators, pathways and markers involved in human cardiac differentiation and development.


Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Coração/embriologia , Mesoderma/citologia , Organogênese/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Biomarcadores , Diferenciação Celular/genética , Linhagem Celular , Análise por Conglomerados , Biologia Computacional/métodos , Proteínas da Matriz Extracelular/metabolismo , Perfilação da Expressão Gênica , Ontologia Genética , Redes Reguladoras de Genes , Humanos , Miócitos Cardíacos/citologia , Miócitos Cardíacos/metabolismo , Transdução de Sinais , Transcriptoma
14.
BMC Syst Biol ; 9: 33, 2015 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-26160520

RESUMO

BACKGROUND: Systems biology research typically involves the integration and analysis of heterogeneous data types in order to model and predict biological processes. Researchers therefore require tools and resources to facilitate the sharing and integration of data, and for linking of data to systems biology models. There are a large number of public repositories for storing biological data of a particular type, for example transcriptomics or proteomics, and there are several model repositories. However, this silo-type storage of data and models is not conducive to systems biology investigations. Interdependencies between multiple omics datasets and between datasets and models are essential. Researchers require an environment that will allow the management and sharing of heterogeneous data and models in the context of the experiments which created them. RESULTS: The SEEK is a suite of tools to support the management, sharing and exploration of data and models in systems biology. The SEEK platform provides an access-controlled, web-based environment for scientists to share and exchange data and models for day-to-day collaboration and for public dissemination. A plug-in architecture allows the linking of experiments, their protocols, data, models and results in a configurable system that is available 'off the shelf'. Tools to run model simulations, plot experimental data and assist with data annotation and standardisation combine to produce a collection of resources that support analysis as well as sharing. Underlying semantic web resources additionally extract and serve SEEK metadata in RDF (Resource Description Format). SEEK RDF enables rich semantic queries, both within SEEK and between related resources in the web of Linked Open Data. CONCLUSION: The SEEK platform has been adopted by many systems biology consortia across Europe. It is a data management environment that has a low barrier of uptake and provides rich resources for collaboration. This paper provides an update on the functions and features of the SEEK software, and describes the use of the SEEK in the SysMO consortium (Systems biology for Micro-organisms), and the VLN (virtual Liver Network), two large systems biology initiatives with different research aims and different scientific communities.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Modelos Biológicos , Biologia de Sistemas , Carbono/metabolismo , Internet , Sulfolobus/metabolismo , Interface Usuário-Computador
15.
J Biomed Semantics ; 5(1): 41, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25276335

RESUMO

BACKGROUND: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. RESULTS: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". CONCLUSIONS: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. AVAILABILITY: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.

17.
Nat Genet ; 44(2): 121-6, 2012 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-22281772

RESUMO

To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.


Assuntos
Pesquisa Biomédica/normas , Armazenamento e Recuperação da Informação/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA