Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Nucleic Acids Res ; 50(D1): D387-D390, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850094

RESUMEN

The Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra/) stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis. Here we note changes in storage designed to increase access and highlight analyses that augment metadata with taxonomic insight to help users select data. In addition, we present three unanticipated applications of taxonomic analysis.


Asunto(s)
Bacterias/genética , Bases de Datos Genéticas , Metadatos/estadística & datos numéricos , Programas Informáticos , Virus/genética , Bacterias/clasificación , Secuencia de Bases , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Filogenia , Reproducibilidad de los Resultados , SARS-CoV-2/genética , Análisis de Secuencia de ARN , Virus/clasificación
2.
Nucleic Acids Res ; 50(D1): D380-D386, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34570235

RESUMEN

Single-cell bisulfite sequencing methods are widely used to assess epigenomic heterogeneity in cell states. Over the past few years, large amounts of data have been generated and facilitated deeper understanding of the epigenetic regulation of many key biological processes including early embryonic development, cell differentiation and tumor progression. It is an urgent need to build a functional resource platform with the massive amount of data. Here, we present scMethBank, the first open access and comprehensive database dedicated to the collection, integration, analysis and visualization of single-cell DNA methylation data and metadata. Current release of scMethBank includes processed single-cell bisulfite sequencing data and curated metadata of 8328 samples derived from 15 public single-cell datasets, involving two species (human and mouse), 29 cell types and two diseases. In summary, scMethBank aims to assist researchers who are interested in cell heterogeneity to explore and utilize whole genome methylation data at single-cell level by providing browse, search, visualization, download functions and user-friendly online tools. The database is accessible at: https://ngdc.cncb.ac.cn/methbank/scm/.


Asunto(s)
Metilación de ADN , Bases de Datos Genéticas , Epigénesis Genética , Genoma , Metadatos/estadística & datos numéricos , Programas Informáticos , Animales , Mapeo Cromosómico , Conjuntos de Datos como Asunto , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Análisis de la Célula Individual , Secuenciación Completa del Genoma
3.
Nucleic Acids Res ; 50(D1): D543-D552, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34723319

RESUMEN

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.


Asunto(s)
Bases de Datos de Proteínas , Metadatos/estadística & datos numéricos , Anotación de Secuencia Molecular/estadística & datos numéricos , Péptidos/química , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Bibliometría , Conjuntos de Datos como Asunto , Humanos , Almacenamiento y Recuperación de la Información , Internet , Espectrometría de Masas , Péptidos/genética , Péptidos/metabolismo , Proteínas/genética , Proteínas/metabolismo , Proteómica/instrumentación , Proteómica/métodos , Alineación de Secuencia
4.
Nucleic Acids Res ; 50(D1): D980-D987, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34791407

RESUMEN

The European Genome-phenome Archive (EGA - https://ega-archive.org/) is a resource for long term secure archiving of all types of potentially identifiable genetic, phenotypic, and clinical data resulting from biomedical research projects. Its mission is to foster hosted data reuse, enable reproducibility, and accelerate biomedical and translational research in line with the FAIR principles. Launched in 2008, the EGA has grown quickly, currently archiving over 4,500 studies from nearly one thousand institutions. The EGA operates a distributed data access model in which requests are made to the data controller, not to the EGA, therefore, the submitter keeps control on who has access to the data and under which conditions. Given the size and value of data hosted, the EGA is constantly improving its value chain, that is, how the EGA can contribute to enhancing the value of human health data by facilitating its submission, discovery, access, and distribution, as well as leading the design and implementation of standards and methods necessary to deliver the value chain. The EGA has become a key GA4GH Driver Project, leading multiple development efforts and implementing new standards and tools, and has been appointed as an ELIXIR Core Data Resource.


Asunto(s)
Confidencialidad/legislación & jurisprudencia , Genoma Humano , Difusión de la Información/métodos , Fenómica/organización & administración , Investigación Biomédica Traslacional/métodos , Conjuntos de Datos como Asunto , Genotipo , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Difusión de la Información/ética , Metadatos/ética , Metadatos/estadística & datos numéricos , Fenómica/historia , Fenotipo
7.
Cancer Res ; 81(23): 5810-5812, 2021 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-34853038

RESUMEN

Profound advances in computational methods, including artificial intelligence (AI), present the opportunity to use the exponentially growing volume and complexity of available cancer measurements toward data-driven personalized care. While exciting, this opportunity has highlighted the disconnect between the promise of compute and the supply of high-quality data. The current paradigm of ad-hoc aggregation and curation of data needs to be replaced with a "metadata supply chain" that provides robust data in context with known provenance, that is, lineage and comprehensive data governance that will allow the promise of AI technology to be realized to its full potential in clinical practice.


Asunto(s)
Inteligencia Artificial , Metadatos/estadística & datos numéricos , Neoplasias/diagnóstico , Neoplasias/terapia , Humanos
10.
Nat Biomed Eng ; 5(6): 533-545, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-34131321

RESUMEN

Regular screening for the early detection of common chronic diseases might benefit from the use of deep-learning approaches, particularly in resource-poor or remote settings. Here we show that deep-learning models can be used to identify chronic kidney disease and type 2 diabetes solely from fundus images or in combination with clinical metadata (age, sex, height, weight, body-mass index and blood pressure) with areas under the receiver operating characteristic curve of 0.85-0.93. The models were trained and validated with a total of 115,344 retinal fundus photographs from 57,672 patients and can also be used to predict estimated glomerulal filtration rates and blood-glucose levels, with mean absolute errors of 11.1-13.4 ml min-1 per 1.73 m2 and 0.65-1.1 mmol l-1, and to stratify patients according to disease-progression risk. We evaluated the generalizability of the models for the identification of chronic kidney disease and type 2 diabetes with population-based external validation cohorts and via a prospective study with fundus images captured with smartphones, and assessed the feasibility of predicting disease progression in a longitudinal cohort.


Asunto(s)
Aprendizaje Profundo , Diabetes Mellitus Tipo 2/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/estadística & datos numéricos , Fotograbar/estadística & datos numéricos , Insuficiencia Renal Crónica/diagnóstico por imagen , Retina/diagnóstico por imagen , Área Bajo la Curva , Glucemia/metabolismo , Estatura , Índice de Masa Corporal , Peso Corporal , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patología , Progresión de la Enfermedad , Femenino , Fondo de Ojo , Tasa de Filtración Glomerular , Humanos , Masculino , Metadatos/estadística & datos numéricos , Persona de Mediana Edad , Redes Neurales de la Computación , Fotograbar/métodos , Estudios Prospectivos , Curva ROC , Insuficiencia Renal Crónica/metabolismo , Insuficiencia Renal Crónica/patología , Retina/metabolismo , Retina/patología
12.
Can Assoc Radiol J ; 72(4): 694-700, 2021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-32412312

RESUMEN

PURPOSE: To determine whether computed tomography radiation dose data could be captured electronically across hospitals to derive regional diagnostic reference levels for quality improvement. METHODS: Data on consecutive computed tomography examinations from 8 hospitals were collected automatically in a central database (Repository) from April 2017 to September 2017. The most frequently performed examinations were used to determine the standard protocols for each hospital. Diagnostic reference levels across hospitals were derived using statistical distribution for 2 radiation dose metrics. These values were compared between hospitals, within and between hospitals by scanner and against national Health Canada achievable doses and diagnostic reference levels. RESULTS: Three master protocol groups, Head, Abdomen-Pelvis, and Chest-Abdomen-Pelvis, accounted for 43% of all valid studies (N = 40 277). For the Repository, 11 of 12 mean values and 75th percentile diagnostic reference levels were below the Health Canada mean and 75th percentile values, and one was the same as the Health Canada value. Mean radiation dose by protocol varied by as much as 97% between hospitals. There was no consistent pattern in the difference between mean doses between large and small hospitals. CONCLUSION: This electronic data acquisition process could be used to continually update achievable doses for frequently used computed tomography examinations in Ontario and eliminate the need for nationwide manual surveys. Results compared across institutions will allow hospitals to maintain achievable doses and lower patient exposure.


Asunto(s)
Niveles de Referencia para Diagnóstico , Informática Médica/métodos , Metadatos/estadística & datos numéricos , Tomografía Computarizada por Rayos X/estadística & datos numéricos , Humanos , Ontario
13.
Nucleic Acids Res ; 49(D1): D121-D124, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33166387

RESUMEN

The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been the core infrastructure for collecting and providing nucleotide sequence data and metadata for >30 years. Three partner organizations, the DNA Data Bank of Japan (DDBJ) at the National Institute of Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have been collaboratively maintaining the INSDC for the benefit of not only science but all types of community worldwide.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Metadatos/estadística & datos numéricos , Nucleótidos/genética , Análisis de Secuencia de ADN/estadística & datos numéricos , Análisis de Secuencia de ARN/estadística & datos numéricos , Academias e Institutos , Secuencia de Bases , Europa (Continente) , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Cooperación Internacional , Japón , Nucleótidos/metabolismo , Estados Unidos
14.
BMC Cancer ; 20(1): 486, 2020 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-32471384

RESUMEN

BACKGROUND: Thousands of research articles on neuroblastoma have been published over the past few decades; however, the heterogeneity and variable quality of scholarly data may challenge scientists or clinicians to survey all of the available information. Hence, holistic measurement and analyzation of neuroblastoma-related literature with the help of sophisticated mathematical tools could provide deep insights into global research performance and the collaborative architectonical structure within the neuroblastoma scientific community. In this scientometric study, we aim to determine the extent of the scientific output related to neuroblastoma research between 1980 and 2018. METHODS: We applied novel scientometric tools, including Bibliometrix R package, biblioshiny, VOSviewer, and CiteSpace IV for comprehensive science mapping analysis of extensive bibliographic metadata, which was retrieved from the Web of ScienceTM Core Collection database. RESULTS: We demonstrate the enormous proliferation of neuroblastoma research during last the 38 years, including 12,435 documents published in 1828 academic journals by 36,908 authors from 86 different countries. These documents received a total of 316,017 citations with an average citation per document of 28.35 ± 7.7. We determine the proportion of highly cited and never cited papers, "occasional" and prolific authors and journals. Further, we show 12 (13.9%) of 86 countries were responsible for 80.4% of neuroblastoma-related research output. CONCLUSIONS: These findings are crucial for researchers, clinicians, journal editors, and others working in neuroblastoma research to understand the strengths and potential gaps in the current literature and to plan future investments in data collection and science policy. This first scientometric study of global neuroblastoma research performance provides valuable insight into the scientific landscape, co-authorship network architecture, international collaboration, and interaction within the neuroblastoma community.


Asunto(s)
Bibliometría , Investigación Biomédica/estadística & datos numéricos , Metadatos/estadística & datos numéricos , Neuroblastoma , Niño , Bases de Datos Factuales/estadística & datos numéricos , Humanos
15.
Nucleic Acids Res ; 48(4): e23, 2020 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-31956905

RESUMEN

The diverse and growing omics data in public domains provide researchers with tremendous opportunity to extract hidden, yet undiscovered, knowledge. However, the vast majority of archived data remain unused. Here, we present MetaOmGraph (MOG), a free, open-source, standalone software for exploratory analysis of massive datasets. Researchers, without coding, can interactively visualize and evaluate data in the context of its metadata, honing-in on groups of samples or genes based on attributes such as expression values, statistical associations, metadata terms and ontology annotations. Interaction with data is easy via interactive visualizations such as line charts, box plots, scatter plots, histograms and volcano plots. Statistical analyses include co-expression analysis, differential expression analysis and differential correlation analysis, with significance tests. Researchers can send data subsets to R for additional analyses. Multithreading and indexing enable efficient big data analysis. A researcher can create new MOG projects from any numerical data; or explore an existing MOG project. MOG projects, with history of explorations, can be saved and shared. We illustrate MOG by case studies of large curated datasets from human cancer RNA-Seq, where we identify novel putative biomarker genes in different tumors, and microarray and metabolomics data from Arabidopsis thaliana. MOG executable and code: http://metnetweb.gdcb.iastate.edu/ and https://github.com/urmi-21/MetaOmGraph/.


Asunto(s)
Macrodatos , Perfilación de la Expresión Génica/estadística & datos numéricos , Regulación de la Expresión Génica/genética , Programas Informáticos , Análisis de Datos , Interpretación Estadística de Datos , Humanos , Metadatos/estadística & datos numéricos
16.
Clin Res Cardiol ; 109(7): 810-818, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31686209

RESUMEN

AIMS: We aimed at developing a structured study protocol utilizing the bibliographic web-application science performance evaluation (SciPE) to perform comprehensive scientometric analyses. METHODS AND RESULTS: Metadata related to publications derived from online databases were processed and visualized by transferring the information to an undirected multipartite graph and distinct partitioned sets of nodes. Also, institution-specific data were normalized and merged allowing precise geocoordinate positioning, to enable heatmapping and valid identification. As a result, verified, processed data regarding articles, institutions, journals, authors gender, nations and subject categories can be obtained. We recommend including the total number of publications, citations, the population, research institutions, gross domestic product, and the country-specific modified Hirsch Index and to form corresponding ratios (e.g., population/publication). Also, our approach includes implementation of bioinformatical methods such as heatmapping based on exact geocoordinates, simple chord diagrams, and the central implementation of specific ratios with plain visualization techniques. CONCLUSION: This protocol allows precise conduction of contemporaneous scientometric analyses based on bioinformatic and meta-analytical techniques, allowing to evaluate and contextualize scientific efforts. Data presentation with the depicted visualization techniques is mandatory for transparent and consistent analyses of research output across different nations and topics. Research performance can then be discussed in a synopsis of all findings.


Asunto(s)
Bibliometría , Investigación Biomédica/estadística & datos numéricos , Cooperación Internacional , Metadatos/estadística & datos numéricos , Humanos
17.
Soc Stud Sci ; 49(5): 732-757, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31354073

RESUMEN

'Metadata' has received a fraction of the attention that 'data' has received in sociological studies of scientific research. A neglect of 'metadata' reduces the attention on a number of critical aspects of scientific work processes, including documentary work, accountability relations, and collaboration routines. Metadata processes and products are essential components of the work needed to practically accomplish day-to-day scientific research tasks, and are central to ensuring that research findings and products meet externally driven standards or requirements. This article is an attempt to open up the discussion on and conceptualization of metadata within the sociology of science and the sociology of data. It presents ethnographic research of metadata creation within everyday scientific practice, focusing on how researchers document, describe, annotate, organize and manage their data, both for their own use and the use of researchers outside of their project. In particular, this article argues that the role and significance of metadata within scientific research contexts are intimately tied to the nature of evidence and accountability within particular social situations. Studying metadata can (1) provide insight into the production of evidence, that is, how something we might call 'data' becomes able to serve an evidentiary role, and (2) provide a mechanism for revealing what people in research contexts are held accountable for, and what they achieve accountability with.


Asunto(s)
Metadatos/estadística & datos numéricos , Proyectos de Investigación
19.
PLoS One ; 14(6): e0218789, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31233549

RESUMEN

The aim of Jscatter is the processing of experimental data and physical models with the focus to enable the user to develop/modify their own models and use them within experimental data evaluation. The basic structures dataArray and dataList contain matrix-like data of different size including attributes to store corresponding metadata. The attributes are used in fit routines as parameters allowing multidimensional attribute dependent fitting. Several modules provide models mainly applied in neutron and X- ray scattering for small angle scattering (form factors and structure factors) and inelastic neutron scattering. The intention is to provide an environment with fit routines, data handling routines (based on NumPy arrays) and a model library to allow the user to focus onto user-written models for data analysis with the benefit of convenient documentation of scientific data evaluation in a scripting environment.


Asunto(s)
Interpretación Estadística de Datos , Programas Informáticos , Algoritmos , Dispersión Dinámica de Luz/estadística & datos numéricos , Metadatos/estadística & datos numéricos , Modelos Estadísticos , Difracción de Neutrones/estadística & datos numéricos , Dispersión del Ángulo Pequeño , Difracción de Rayos X/estadística & datos numéricos
20.
J Digit Imaging ; 32(5): 870-879, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31201587

RESUMEN

In the last decades, the amount of medical imaging studies and associated metadata has been rapidly increasing. Despite being mostly used for supporting medical diagnosis and treatment, many recent initiatives claim the use of medical imaging studies in clinical research scenarios but also to improve the business practices of medical institutions. However, the continuous production of medical imaging studies coupled with the tremendous amount of associated data, makes the real-time analysis of medical imaging repositories difficult using conventional tools and methodologies. Those archives contain not only the image data itself but also a wide range of valuable metadata describing all the stakeholders involved in the examination. The exploration of such technologies will increase the efficiency and quality of medical practice. In major centers, it represents a big data scenario where Business Intelligence (BI) and Data Analytics (DA) are rare and implemented through data warehousing approaches. This article proposes an Extract, Transform, Load (ETL) framework for medical imaging repositories able to feed, in real-time, a developed BI (Business Intelligence) application. The solution was designed to provide the necessary environment for leading research on top of live institutional repositories without requesting the creation of a data warehouse. It features an extensible dashboard with customizable charts and reports, with an intuitive web-based interface that empowers the usage of novel data mining techniques, namely, a variety of data cleansing tools, filters, and clustering functions. Therefore, the user is not required to master the programming skills commonly needed for data analysts and scientists, such as Python and R.


Asunto(s)
Minería de Datos/métodos , Data Warehousing/métodos , Metadatos/estadística & datos numéricos , Sistemas de Información Radiológica/organización & administración , Sistemas de Información Radiológica/estadística & datos numéricos , Minería de Datos/estadística & datos numéricos , Data Warehousing/estadística & datos numéricos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA