RESUMO
Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.
RESUMO
BACKGROUND: The occurrences of acute complications arising from hypoglycemia and hyperglycemia peak as young adults with type 1 diabetes (T1D) take control of their own care. Continuous glucose monitoring (CGM) devices provide real-time glucose readings enabling users to manage their control proactively. Machine learning algorithms can use CGM data to make ahead-of-time risk predictions and provide insight into an individual's longer term control. METHODS: We introduce explainable machine learning to make predictions of hypoglycemia (<70 mg/dL) and hyperglycemia (>270 mg/dL) up to 60 minutes ahead of time. We train our models using CGM data from 153 people living with T1D in the CITY (CGM Intervention in Teens and Young Adults With Type 1 Diabetes)survey totaling more than 28 000 days of usage, which we summarize into (short-term, medium-term, and long-term) glucose control features along with demographic information. We use machine learning explanations (SHAP [SHapley Additive exPlanations]) to identify which features have been most important in predicting risk per user. RESULTS: Machine learning models (XGBoost) show excellent performance at predicting hypoglycemia (area under the receiver operating curve [AUROC]: 0.998, average precision: 0.953) and hyperglycemia (AUROC: 0.989, average precision: 0.931) in comparison with a baseline heuristic and logistic regression model. CONCLUSIONS: Maximizing model performance for glucose risk prediction and management is crucial to reduce the burden of alarm fatigue on CGM users. Machine learning enables more precise and timely predictions in comparison with baseline models. SHAP helps identify what about a CGM user's glucose control has led to predictions of risk which can be used to reduce their long-term risk of complications.
RESUMO
BACKGROUND: A more comprehensive understanding and measurement of adult social care need could contribute to efforts to develop more effective, holistic personalised care, particularly for those with multiple long-term conditions (MLTC). Progress in this area faces the challenge of a lack of clarity in the literature relating to how social care need is assessed and coded within variables included in primary care databases. AIM: To explore how social care need is assessed and coded within variables included in primary care databases. DESIGN & SETTING: An exploratory rapid scoping review of peer-reviewed articles and grey literature. METHOD: Articles were screened and extracted onto a charting sheet and findings were summarised descriptively. Articles were included if published in English and related to primary and social care using data from national primary care databases. RESULTS: The search yielded 4010 articles. Twenty-seven were included. Six articles used the term 'social care need', although related terminology was identified including 'need factors', 'social support', and 'social care support'. Articles mainly focused on specific components of social care need, including levels of social care usage or service utilisation and costs incurred to social care, primary care, and other providers in addressing needs. A limited range of database variables were found measuring social care need. CONCLUSION: Further research is needed on how social care need has been defined in a UK context and captured in primary care big databases. There is potential scope to broaden the definition of social care need, which captures social service needs and wider social needs.
RESUMO
BACKGROUND: Multiple long-term health conditions (multimorbidity) (MLTC-M) are increasingly prevalent and associated with high rates of morbidity, mortality, and health care expenditure. Strategies to address this have primarily focused on the biological aspects of disease, but MLTC-M also result from and are associated with additional psychosocial, economic, and environmental barriers. A shift toward more personalized, holistic, and integrated care could be effective. This could be made more efficient by identifying groups of populations based on their health and social needs. In turn, these will contribute to evidence-based solutions supporting delivery of interventions tailored to address the needs pertinent to each cluster. Evidence is needed on how to generate clusters based on health and social needs and quantify the impact of clusters on long-term health and costs. OBJECTIVE: We intend to develop and validate population clusters that consider determinants of health and social care needs for people with MLTC-M using data-driven machine learning (ML) methods compared to expert-driven approaches within primary care national databases, followed by evaluation of cluster trajectories and their association with health outcomes and costs. METHODS: The mixed methods program of work with parallel work streams include the following: (1) qualitative semistructured interview studies exploring patient, caregiver, and professional views on clinical and socioeconomic factors influencing experiences of living with or seeking care in MLTC-M; (2) modified Delphi with relevant stakeholders to generate variables on health and social (wider) determinants and to examine the feasibility of including these variables within existing primary care databases; and (3) cohort study with expert-driven segmentation, alongside data-driven algorithms. Outputs will be compared, clusters characterized, and trajectories over time examined to quantify associations with mortality, additional long-term conditions, worsening frailty, disease severity, and 10-year health and social care costs. RESULTS: The study will commence in October 2021 and is expected to be completed by October 2023. CONCLUSIONS: By studying MLTC-M clusters, we will assess how more personalized care can be developed, how accurate costs can be provided, and how to better understand the personal and medical profiles and environment of individuals within each cluster. Integrated care that considers "whole persons" and their environment is essential in addressing the complex, diverse, and individual needs of people living with MLTC-M. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/34405.
RESUMO
Molecular interaction data exists in a number of repositories, each with its own data format, molecule identifier and information coverage. Michigan molecular interactions (MiMI) assists scientists searching through this profusion of molecular interaction data. The original release of MiMI gathered data from well-known protein interaction databases, and deep merged this information while keeping track of provenance. Based on the feedback received from users, MiMI has been completely redesigned. This article describes the resulting MiMI Release 2 (MiMIr2). New functionality includes extension from proteins to genes and to pathways; identification of highlighted sentences in source publications; seamless two-way linkage with Cytoscape; query facilities based on MeSH/GO terms and other concepts; approximate graph matching to find relevant pathways; support for querying in bulk; and a user focus-group driven interface design. MiMI is part of the NIH's; National Center for Integrative Biomedical Informatics (NCIBI) and is publicly available at: http://mimi.ncibi.org.
Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Proteínas/metabolismo , Gráficos por Computador , Proteínas/genética , Interface Usuário-ComputadorRESUMO
Protein interaction data exists in a number of repositories. Each repository has its own data format, molecule identifier and supplementary information. Michigan Molecular Interactions (MiMI) assists scientists searching through this overwhelming amount of protein interaction data. MiMI gathers data from well-known protein interaction databases and deep-merges the information. Utilizing an identity function, molecules that may have different identifiers but represent the same real-world object are merged. Thus, MiMI allows the users to retrieve information from many different databases at once, highlighting complementary and contradictory information. To help scientists judge the usefulness of a piece of data, MiMI tracks the provenance of all data. Finally, a simple yet powerful user interface aids users in their queries, and frees them from the onerous task of knowing the data format or learning a query language. MiMI allows scientists to query all data, whether corroborative or contradictory, and specify which sources to utilize. MiMI is part of the National Center for Integrative Biomedical Informatics (NCIBI) and is publicly available at: http://mimi.ncibi.org.
Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Internet , Interface Usuário-ComputadorRESUMO
Protein data, from sequence and structure to interaction, is being generated through many diverse methodologies; it is stored and reported in numerous forms and multiple places. The magnitude of the data limits researchers abilities to utilize all information generated. Effective integration of protein data can be accomplished through better data modeling. We demonstrate this through the MIPD project.