Búsqueda | Portal de Búsqueda de la BVS Ecuador

Automated annotation of scientific texts for ML-based keyphrase extraction and validation.

Amusat, Oluwamayowa O; Hegde, Harshad; Mungall, Christopher J; Giannakou, Anna; Byers, Neil P; Gunter, Dan; Fagnan, Kjiersten; Ramakrishnan, Lavanya.

Database (Oxford) ; 20242024 Sep 27.

Artículo en Inglés | MEDLINE | ID: mdl-39331731

RESUMEN

Advanced omics technologies and facilities generate a wealth of valuable data daily; however, the data often lack the essential metadata required for researchers to find, curate, and search them effectively. The lack of metadata poses a significant challenge in the utilization of these data sets. Machine learning (ML)-based metadata extraction techniques have emerged as a potentially viable approach to automatically annotating scientific data sets with the metadata necessary for enabling effective search. Text labeling, usually performed manually, plays a crucial role in validating machine-extracted metadata. However, manual labeling is time-consuming and not always feasible; thus, there is a need to develop automated text labeling techniques in order to accelerate the process of scientific innovation. This need is particularly urgent in fields such as environmental genomics and microbiome science, which have historically received less attention in terms of metadata curation and creation of gold-standard text mining data sets. In this paper, we present two novel automated text labeling approaches for the validation of ML-generated metadata for unlabeled texts, with specific applications in environmental genomics. Our techniques show the potential of two new ways to leverage existing information that is only available for select documents within a corpus to validate ML models, which can then be used to describe the remaining documents in the corpus. The first technique exploits relationships between different types of data sources related to the same research study, such as publications and proposals. The second technique takes advantage of domain-specific controlled vocabularies or ontologies. In this paper, we detail applying these approaches in the context of environmental genomics research for ML-generated metadata validation. Our results show that the proposed label assignment approaches can generate both generic and highly specific text labels for the unlabeled texts, with up to 44% of the labels matching with those suggested by a ML keyword extraction algorithm.

Asunto(s)

Curaduría de Datos , Minería de Datos , Aprendizaje Automático , Curaduría de Datos/métodos , Minería de Datos/métodos , Metadatos

Challenges in Bioinformatics Workflows for Processing Microbiome Omics Data at Scale.

Hu, Bin; Canon, Shane; Eloe-Fadrosh, Emiley A; Babinski, Michal; Corilo, Yuri; Davenport, Karen; Duncan, William D; Fagnan, Kjiersten; Flynn, Mark; Foster, Brian; Hays, David; Huntemann, Marcel; Jackson, Elais K Player; Kelliher, Julia; Li, Po-E; Lo, Chien-Chi; Mans, Douglas; McCue, Lee Ann; Mouncey, Nigel; Mungall, Christopher J; Piehowski, Paul D; Purvine, Samuel O; Smith, Montana; Varghese, Neha Jacob; Winston, Donald; Xu, Yan; Chain, Patrick S G.

Front Bioinform ; 1: 826370, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-36303775

RESUMEN

The nascent field of microbiome science is transitioning from a descriptive approach of cataloging taxa and functions present in an environment to applying multi-omics methods to investigate microbiome dynamics and function. A large number of new tools and algorithms have been designed and used for very specific purposes on samples collected by individual investigators or groups. While these developments have been quite instructive, the ability to compare microbiome data generated by many groups of researchers is impeded by the lack of standardized application of bioinformatics methods. Additionally, there are few examples of broad bioinformatics workflows that can process metagenome, metatranscriptome, metaproteome and metabolomic data at scale, and no central hub that allows processing, or provides varied omics data that are findable, accessible, interoperable and reusable (FAIR). Here, we review some of the challenges that exist in analyzing omics data within the microbiome research sphere, and provide context on how the National Microbiome Data Collaborative has adopted a standardized and open access approach to address such challenges.

Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities.

Vangay, Pajau; Burgin, Josephine; Johnston, Anjanette; Beck, Kristen L; Berrios, Daniel C; Blumberg, Kai; Canon, Shane; Chain, Patrick; Chandonia, John-Marc; Christianson, Danielle; Costes, Sylvain V; Damerow, Joan; Duncan, William D; Dundore-Arias, Jose Pablo; Fagnan, Kjiersten; Galazka, Jonathan M; Gibbons, Sean M; Hays, David; Hervey, Judson; Hu, Bin; Hurwitz, Bonnie L; Jaiswal, Pankaj; Joachimiak, Marcin P; Kinkel, Linda; Ladau, Joshua; Martin, Stanton L; McCue, Lee Ann; Miller, Kayd; Mouncey, Nigel; Mungall, Chris; Pafilis, Evangelos; Reddy, T B K; Richardson, Lorna; Roux, Simon; Schriml, Lynn M.; Shaffer, Justin P; Sundaramurthi, Jagadish Chandrabose; Thompson, Luke R; Timme, Ruth E; Zheng, Jie; Wood-Charlson, Elisha M; Eloe-Fadrosh, Emiley A.

mSystems ; 6(1)2021 02 23.

Artículo en Inglés | MEDLINE | ID: mdl-33622857

RESUMEN

Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. Important contributions have been made in the development of community-driven metadata standards; however, these standards have not been uniformly embraced by the microbiome research community. To understand how these standards are being adopted, or the barriers to adoption, across research domains, institutions, and funding agencies, the National Microbiome Data Collaborative (NMDC) hosted a workshop in October 2019. This report provides a summary of discussions that took place throughout the workshop, as well as outcomes of the working groups initiated at the workshop.

Correction for Vangay et al., "Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities".

Vangay, Pajau; Burgin, Josephine; Johnston, Anjanette; Beck, Kristen L; Berrios, Daniel C; Blumberg, Kai; Canon, Shane; Chain, Patrick; Chandonia, John-Marc; Christianson, Danielle; Costes, Sylvain V; Damerow, Joan; Duncan, William D; Dundore-Arias, Jose Pablo; Fagnan, Kjiersten; Galazka, Jonathan M; Gibbons, Sean M; Hays, David; Hervey, Judson; Hu, Bin; Hurwitz, Bonnie L; Jaiswal, Pankaj; Joachimiak, Marcin P; Kinkel, Linda; Ladau, Joshua; Martin, Stanton L; McCue, Lee Ann; Miller, Kayd; Mouncey, Nigel; Mungall, Chris; Pafilis, Evangelos; Reddy, T B K; Richardson, Lorna; Roux, Simon; Schriml, Lynn M; Shaffer, Justin P; Sundaramurthi, Jagadish Chandrabose; Thompson, Luke R; Timme, Ruth E; Zheng, Jie; Wood-Charlson, Elisha M; Eloe-Fadrosh, Emiley A.

mSystems ; 6(3)2021 May 04.

Artículo en Inglés | MEDLINE | ID: mdl-33947809

The National Microbiome Data Collaborative: enabling microbiome science.

Wood-Charlson, Elisha M; Auberry, Deanna; Blanco, Hannah; Borkum, Mark I; Corilo, Yuri E; Davenport, Karen W; Deshpande, Shweta; Devarakonda, Ranjeet; Drake, Meghan; Duncan, William D; Flynn, Mark C; Hays, David; Hu, Bin; Huntemann, Marcel; Li, Po-E; Lipton, Mary; Lo, Chien-Chi; Millard, David; Miller, Kayd; Piehowski, Paul D; Purvine, Samuel; Reddy, T B K; Shakya, Migun; Sundaramurthi, Jagadish Chandrabose; Vangay, Pajau; Wei, Yaxing; Wilson, Bruce E; Canon, Shane; Chain, Patrick S G; Fagnan, Kjiersten; Martin, Stanton; McCue, Lee Ann; Mungall, Christopher J; Mouncey, Nigel J; Maxon, Mary E; Eloe-Fadrosh, Emiley A.

Nat Rev Microbiol ; 18(6): 313-314, 2020 06.

Artículo en Inglés | MEDLINE | ID: mdl-32350400

Asunto(s)

Microbiota , Ciencia de los Datos , Humanos , Colaboración Intersectorial

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA