Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38836701

RESUMEN

Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator's premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Difusión de la Información , Humanos , Informática Médica/métodos
2.
Plant Cell Physiol ; 58(1): e4, 2017 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28013278

RESUMEN

ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas/genética , Proteínas de Arabidopsis/metabolismo , Biología Computacional/métodos , Ontología de Genes , Genómica/métodos , Almacenamiento y Recuperación de la Información/métodos , Internet , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN
3.
Nucleic Acids Res ; 43(Database issue): D1003-9, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25414324

RESUMEN

The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was conceived as a framework that allows the research community to develop and release 'modules' that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts 'science apps,' developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community.


Asunto(s)
Arabidopsis/genética , Bases de Datos Genéticas , Genoma de Planta , Minería de Datos , Internet , Programas Informáticos
4.
Nucleic Acids Res ; 42(Web Server issue): W468-72, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24753429

RESUMEN

InterMine (www.intermine.org) is a biological data warehousing system providing extensive automatically generated and configurable RESTful web services that underpin the web interface and can be re-used in many other applications: to find and filter data; export it in a flexible and structured way; to upload, use, manipulate and analyze lists; to provide services for flexible retrieval of sequence segments, and for other statistical and analysis tools. Here we describe these features and discuss how they can be used separately or in combinations to support integrative and comparative analysis.


Asunto(s)
Bases de Datos Factuales , Programas Informáticos , Animales , Cromosomas/química , Humanos , Internet , Ratones , Análisis de Secuencia de ADN , Interfaz Usuario-Computador
5.
Genesis ; 53(8): 547-60, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26097192

RESUMEN

InterMine is a data integration warehouse and analysis software system developed for large and complex biological data sets. Designed for integrative analysis, it can be accessed through a user-friendly web interface. For bioinformaticians, extensive web services as well as programming interfaces for most common scripting languages support access to all features. The web interface includes a useful identifier look-up system, and both simple and sophisticated search options. Interactive results tables enable exploration, and data can be filtered, summarized, and browsed. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other entities. InterMine databases have been developed for the major model organisms, budding yeast, nematode worm, fruit fly, zebrafish, mouse, and rat together with a newly developed human database. Here, we describe how this has facilitated interoperation and development of cross-organism analysis tools and reports. InterMine as a data exploration and analysis tool is also described. All the InterMine-based systems described in this article are resources freely available to the scientific community.


Asunto(s)
Bases de Datos Factuales , Programas Informáticos , Animales , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica , Humanos , Internet , Integración de Sistemas , Interfaz Usuario-Computador
6.
Nucleic Acids Res ; 40(Database issue): D1082-8, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22080565

RESUMEN

In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.


Asunto(s)
Caenorhabditis elegans/genética , Bases de Datos Genéticas , Drosophila melanogaster/genética , Animales , Expresión Génica , Genoma de los Helmintos , Genoma de los Insectos , Genómica , Internet , Interfaz Usuario-Computador
7.
BMC Genomics ; 14: 494, 2013 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-23875683

RESUMEN

BACKGROUND: Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition. RESULTS: In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (https://github.com/modENCODE-DCC/Galaxy), on the public Amazon Cloud (http://aws.amazon.com), and on the private Bionimbus Cloud for genomic research (http://www.bionimbus.org). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies. CONCLUSIONS: Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.


Asunto(s)
Inmunoprecipitación de Cromatina , Programas Informáticos
8.
Bioinformatics ; 28(23): 3163-5, 2012 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-23023984

RESUMEN

SUMMARY: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of 'widgets' performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages. AVAILABILITY: Freely available from http://www.intermine.org under the LGPL license. CONTACT: g.micklem@gen.cam.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Algoritmos , Minería de Datos , Genómica , Internet , Lenguajes de Programación
9.
Database (Oxford) ; 20222022 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-35820040

RESUMEN

HumanMine (www.humanmine.org) is an integrated database of human genomics and proteomics data that provides a powerful interface to support sophisticated exploration and analysis of data compiled from experimental, computational and curated data sources. Built using the InterMine data integration platform, HumanMine includes genes, proteins, pathways, expression levels, Single nucleotide polymorphism (SNP), diseases and more, integrated into a single searchable database. HumanMine promotes integrative analysis, a powerful approach in modern biology that allows many sources of evidence to be analysed together. The data can be accessed through a user-friendly web interface as well as a powerful, scriptable web service Application programming interface (API) to allow programmatic access to data. The web interface includes a useful identifier resolution system, sophisticated query options and interactive results tables that enable powerful exploration of data, including data summaries, filtering, browsing and export. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other biological entities. HumanMine can be used for integrative multistaged analysis that can lead to new insights and uncover previously unknown relationships. Database URL: https://www.humanmine.org.


Asunto(s)
Genoma Humano , Almacenamiento y Recuperación de la Información , Bases de Datos Factuales , Humanos , Proteómica
10.
BMC Struct Biol ; 4: 3, 2004 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-15113423

RESUMEN

BACKGROUND: Many characterised proteins contain metal ions, small organic molecules or modified residues. In contrast, the huge amount of data generated by genome projects consists exclusively of sequences with almost no annotation. One of the goals of the structural genomics initiative is to provide representative three-dimensional (3-D) structures for as many protein/domain folds as possible to allow successful homology modelling. However, important functional features such as metal co-ordination or a type of prosthetic group are not always conserved in homologous proteins. So far, the problem of correct annotation of bioinorganic proteins has been largely ignored by the bioinformatics community and information on bioinorganic centres obtained by methods other than crystallography or NMR is only available in literature databases. RESULTS: COMe (Co-Ordination of Metals) represents the ontology for bioinorganic and other small molecule centres in complex proteins. COMe consists of three types of entities: 'bioinorganic motif' (BIM), 'molecule' (MOL), and 'complex proteins' (PRX), with each entity being assigned a unique identifier. A BIM consists of at least one centre (metal atom, inorganic cluster, organic molecule) and two or more endogenous and/or exogenous ligands. BIMs are represented as one-dimensional (1-D) strings and 2-D diagrams. A MOL entity represents a 'small molecule' which, when in complex with one or more polypeptides, forms a functional protein. The PRX entities refer to the functional proteins as well as to separate protein domains and subunits. The complex proteins in COMe are subdivided into three categories: (i) metalloproteins, (ii) organic prosthetic group proteins and (iii) modified amino acid proteins. The data are currently stored in both XML format and a relational database and are available at http://www.ebi.ac.uk/come/. CONCLUSION: COMe provides the classification of proteins according to their 'bioinorganic' features and thus is orthogonal to other classification schemes, such as those based on sequence similarity, 3-D fold, enzyme activity, or biological process. The hierarchical organisation of the controlled vocabulary allows both for annotation and querying at different levels of granularity.


Asunto(s)
Bases de Datos de Proteínas/normas , Metales/química , Proteínas/química , Proteínas/clasificación , Sitios de Unión , Biología Computacional , Internet , Ligandos , Terminología como Asunto , Interfaz Usuario-Computador
11.
C R Biol ; 326(10-11): 1075-8, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-14744115

RESUMEN

ArrayExpress is a public repository for microarray-based gene expression data, resulting from the implementation of the MAGE object model to ensure accurate data structuring and the MIAME standard, which defines the annotation requirements. ArrayExpress accepts data as MAGE-ML files for direct submissions or data from MIAMExpress, the MIAME compliant web-based annotation and submission tool of EBI. A team of curators supports the submission process, providing assistance in data annotation. Data retrieval is performed through a dedicated web interface. Relevant results may be exported to ExpressionProfiler, the EBI based expression analysis tool available online (http://www.ebi.ac.uk/arrayexpress).


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos
12.
Database (Oxford) ; 2011: bar023, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21856757

RESUMEN

The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org.


Asunto(s)
Bases de Datos Genéticas , Genoma , Genómica/métodos , Internet , Programas Informáticos , Animales , Caenorhabditis elegans/genética , ADN/genética , Drosophila melanogaster/genética , Humanos
13.
Science ; 330(6012): 1775-87, 2010 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-21177976

RESUMEN

We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.


Asunto(s)
Caenorhabditis elegans/genética , Cromosomas , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genoma de los Helmintos , Anotación de Secuencia Molecular , Animales , Caenorhabditis elegans/crecimiento & desarrollo , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/metabolismo , Cromatina/genética , Cromatina/metabolismo , Cromatina/ultraestructura , Cromosomas/genética , Cromosomas/metabolismo , Cromosomas/ultraestructura , Biología Computacional/métodos , Secuencia Conservada , Evolución Molecular , Redes Reguladoras de Genes , Genes de Helminto , Genómica/métodos , Histonas/metabolismo , Modelos Genéticos , ARN de Helminto/genética , ARN de Helminto/metabolismo , ARN no Traducido/genética , ARN no Traducido/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
14.
Bioinformatics ; 21(8): 1495-501, 2005 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-15564302

RESUMEN

MOTIVATION: The lack of microarray data management systems and databases is still one of the major problems faced by many life sciences laboratories. While developing the public repository for microarray data ArrayExpress we had to find novel solutions to many non-trivial software engineering problems. Our experience will be both relevant and useful for most bioinformaticians involved in developing information systems for a wide range of high-throughput technologies. RESULTS: ArrayExpress has been online since February 2002, growing exponentially to well over 10,000 hybridizations (as of September 2004). It has been demonstrated that our chosen design and implementation works for databases aimed at storage, access and sharing of high-throughput data. AVAILABILITY: The ArrayExpress database is available at http://www.ebi.ac.uk/arrayexpress/. The software is open source. CONTACT: ugis@ebi.ac.uk.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Almacenamiento y Recuperación de la Información/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Proteínas/genética , Proteínas/metabolismo , Programas Informáticos , Algoritmos , Difusión de la Información/métodos
15.
Plant Physiol ; 139(2): 632-6, 2005 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-16219923

RESUMEN

ArrayExpress is a public microarray repository founded on the Minimum Information About a Microarray Experiment (MIAME) principles that stores MIAME-compliant gene expression data. Plant-based data sets represent approximately one-quarter of the experiments in ArrayExpress. The majority are based on Arabidopsis (Arabidopsis thaliana); however, there are other data sets based on Triticum aestivum, Hordeum vulgare, and Populus subsp. AtMIAMExpress is an open-source Web-based software application for the submission of Arabidopsis-based microarray data to ArrayExpress. AtMIAMExpress exports data in MAGE-ML format for upload to any MAGE-ML-compliant application, such as J-Express and ArrayExpress. It was designed as a tool for users with minimal bioinformatics expertise, has comprehensive help and user support, and represents a simple solution to meeting the MIAME guidelines for the Arabidopsis community. Plant data are queryable both in ArrayExpress and in the Data Warehouse databases, which support queries based on gene-centric and sample-centric annotation. The AtMIAMExpress submission tool is available at http://www.ebi.ac.uk/at-miamexpress/. The software is open source and is available from http://sourceforge.net/projects/miamexpress/. For information, contact miamexpress@ebi.ac.uk.


Asunto(s)
Arabidopsis/genética , Bases de Datos Genéticas , Academias e Institutos , Biología Computacional , Europa (Continente) , Perfilación de la Expresión Génica , Internet , Análisis de Secuencia por Matrices de Oligonucleótidos , Programas Informáticos , Triticum/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA