RESUMEN
BioStudies (www.ebi.ac.uk/biostudies) is a new public database that organizes data from biological studies. Typically, but not exclusively, a study is associated with a publication. BioStudies offers a simple way to describe the study structure, and provides flexible data deposition tools and data access interfaces. The actual data can be stored either in BioStudies or remotely, or both. BioStudies imports supplementary data from Europe PMC, and is a resource for authors and publishers for packaging data during the manuscript preparation process. It also can support data management needs of collaborative projects. The growth in multiomics experiments and other multi-faceted approaches to life sciences research mean that studies result in a diversity of data outputs in multiple locations. BioStudies presents a solution to ensuring that all these data and the associated publication(s) can be found coherently in the longer term.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Bases de Datos Factuales , Animales , Humanos , Internet , Programas InformáticosRESUMEN
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN , Transcriptoma/genética , Alelos , Línea Celular Transformada , Exones/genética , Perfilación de la Expresión Génica , Humanos , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , ARN Mensajero/análisis , ARN Mensajero/genéticaRESUMEN
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42,000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.
Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Programas InformáticosRESUMEN
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
Asunto(s)
Bases de Datos Genéticas , Genómica , Análisis por Micromatrices , Bases de Datos Genéticas/estadística & datos numéricos , Bases de Datos Genéticas/tendencias , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19,014 biological conditions in 136,551 assays from 5598 independent studies.
Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Atlas como Asunto , Genómica , Humanos , MicroARNs/metabolismo , Anotación de Secuencia Molecular , Análisis de Secuencia de ARN , Interfaz Usuario-ComputadorRESUMEN
SUMMARY: We present an R based pipeline, ArrayExpressHTS, for pre-processing, expression estimation and data quality assessment of high-throughput sequencing transcriptional profiling (RNA-seq) datasets. The pipeline starts from raw sequence files and produces standard Bioconductor R objects containing gene or transcript measurements for downstream analysis along with web reports for data quality assessment. It may be run locally on a user's own computer or remotely on a distributed R-cloud farm at the European Bioinformatics Institute. It can be used to analyse user's own datasets or public RNA-seq datasets from the ArrayExpress Archive. AVAILABILITY: The R package is available at www.ebi.ac.uk/tools/rcloud with online documentation at www.ebi.ac.uk/Tools/rwiki/, also available as supplementary material.