Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
1.
Mass Spectrom Rev ; 2023 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-37534389

RESUMO

We are approaching the third decade since the establishment of the very first proteomics repositories back in the mid-'00s. New experimental approaches and technologies continuously enrich the field while producing vast amounts of mass spectrometry data. Together with initiatives to establish standard terminology and file formats, proteomics is rapidly transforming into a mature component of systems biology. Here we describe the ProteomeXchange consortium repositories. We specifically search, collect and evaluate public human tissue datasets (categorized as "complete" by the repository) submitted in 2015-2022, to both map the existing information and assess the data set reusability. Human tissue data are variably represented in the repositories reviewed, ranging between 10% and 25% of the total data submitted, with cancers being the most represented, followed by neuronal and cardiovascular diseases. About half of the retrieved data sets were found to lack annotations or metadata necessary to directly replicate the analysis. This poses a rough challenge to data reusability and highlights the need to increase awareness of the mage-tab file format for metadata in the community. Overall, proteomics repositories have evolved greatly over the past 7 years, as they have grown in size and become equipped with various powerful applications and tools that enable data searching and analytical tasks. However, to make the most of this potential, priority must be given to finding ways to secure detailed metadata for each submission, which is likely the next major milestone for proteomics repositories.

2.
Proteomics ; 23(7-8): e2200014, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36074795

RESUMO

Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.


Assuntos
Proteoma , Proteômica , Proteômica/métodos , Espectrometria de Massas/métodos , Biologia Computacional/métodos
3.
Stat Sci ; 38(4): 557-575, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-38223302

RESUMO

Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.

4.
BMC Med Ethics ; 24(1): 49, 2023 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-37422629

RESUMO

BACKGROUND: It has been argued that ethics review committees-e.g., Research Ethics Committees, Institutional Review Boards, etc.- have weaknesses in reviewing big data and artificial intelligence research. For instance, they may, due to the novelty of the area, lack the relevant expertise for judging collective risks and benefits of such research, or they may exempt it from review in instances involving de-identified data. MAIN BODY: Focusing on the example of medical research databases we highlight here ethical issues around de-identified data sharing which motivate the need for review where oversight by ethics committees is weak. Though some argue for ethics committee reform to overcome these weaknesses, it is unclear whether or when that will happen. Hence, we argue that ethical review can be done by data access committees, since they have de facto purview of big data and artificial intelligence projects, relevant technical expertise and governance knowledge, and already take on some functions of ethical review. That said, like ethics committees, they may have functional weaknesses in their review capabilities. To strengthen that function, data access committees must think clearly about the kinds of ethical expertise, both professional and lay, that they draw upon to support their work. CONCLUSION: Data access committees can undertake ethical review of medical research databases provided they enhance that review function through professional and lay ethical expertise.


Assuntos
Inteligência Artificial , Pesquisa Biomédica , Humanos , Revisão Ética , Comissão de Ética , Comitês de Ética em Pesquisa , Disseminação de Informação
5.
BMC Med Ethics ; 23(1): 95, 2022 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-36131283

RESUMO

BACKGROUND: Biobanks and biomedical research data repositories collect their samples and associated data from volunteer participants. Their aims are to facilitate biomedical research and improve health, and they are framed in terms of contributing to the public good. Biobank resources may be accessible to researchers with commercial motivations, for example, researchers in pharmaceutical companies who may utilise the data to develop new clinical therapeutics and pharmaceutical drugs. Studies exploring citizen perceptions of public/private interactions associated with large health data repositories/biobanks indicate that there are sensitivities around public/private and/or non-profit/profit relationships and international sample and data sharing. Less work has explored how biobanks communicate their public/private partnerships to the public or to their potential research participants. METHODS: We explored how a biobank's aims, benefits and risks, and private/public relationships have been framed in public facing recruitment documents (consent forms and participant information sheets). RESULTS: Biobank documents often communicate their commercial access arrangements but not the detail about what these interactions would entail, and how risks and benefits would be distributed to the public. CONCLUSION: We argue that this leads to a polarised discourse between public and private entities and/or activities, and fails to attend to the blurred lines between them. This results in a lack of attention to more important issues such as how risks and benefits in general are distributed to the public. We call for a nuanced approach that can contribute to the much-needed dialogue in this space.


Assuntos
Bancos de Espécimes Biológicos , Pesquisa Biomédica , Humanos , Preparações Farmacêuticas , Pesquisadores , Medição de Risco
6.
Brief Bioinform ; 20(3): 1032-1056, 2019 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-29186315

RESUMO

The human gut microbiome impacts several aspects of human health and disease, including digestion, drug metabolism and the propensity to develop various inflammatory, autoimmune and metabolic diseases. Many of the molecular processes that play a role in the activity and dynamics of the microbiota go beyond species and genic composition and thus, their understanding requires advanced bioinformatics support. This article aims to provide an up-to-date view of the resources and software tools that are being developed and used in human gut microbiome research, in particular data integration and systems-level analysis efforts. These efforts demonstrate the power of standardized and reproducible computational workflows for integrating and analysing varied omics data and gaining deeper insights into microbe community structure and function as well as host-microbe interactions.


Assuntos
Microbioma Gastrointestinal , Biologia Computacional , Ensaios de Triagem em Larga Escala , Humanos , Reprodutibilidade dos Testes , Software
7.
Curr Genomics ; 22(4): 244-266, 2021 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-35273457

RESUMO

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.

8.
J Med Internet Res ; 23(10): e31294, 2021 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-34714253

RESUMO

BACKGROUND: Digital health research repositories propose sharing longitudinal streams of health records and personal sensing data between multiple projects and researchers. Motivated by the prospect of personalizing patient care (precision medicine), these initiatives demand broad public acceptance and large numbers of data contributors, both of which are challenging. OBJECTIVE: This study investigates public attitudes toward possibly contributing to digital health research repositories to identify factors for their acceptance and to inform future developments. METHODS: A cross-sectional online survey was conducted from March 2020 to December 2020. Because of the funded project scope and a multicenter collaboration, study recruitment targeted young adults in Denmark and Brazil, allowing an analysis of the differences between 2 very contrasting national contexts. Through closed-ended questions, the survey examined participants' willingness to share different data types, data access preferences, reasons for concern, and motivations to contribute. The survey also collected information about participants' demographics, level of interest in health topics, previous participation in health research, awareness of examples of existing research data repositories, and current attitudes about digital health research repositories. Data analysis consisted of descriptive frequency measures and statistical inferences (bivariate associations and logistic regressions). RESULTS: The sample comprises 1017 respondents living in Brazil (1017/1600, 63.56%) and 583 in Denmark (583/1600, 36.44%). The demographics do not differ substantially between participants of these countries. The majority is aged between 18 and 27 years (933/1600, 58.31%), is highly educated (992/1600, 62.00%), uses smartphones (1562/1600, 97.63%), and is in good health (1407/1600, 87.94%). The analysis shows a vast majority were very motivated by helping future patients (1366/1600, 85.38%) and researchers (1253/1600, 78.31%), yet very concerned about unethical projects (1219/1600, 76.19%), profit making without consent (1096/1600, 68.50%), and cyberattacks (1055/1600, 65.94%). Participants' willingness to share data is lower when sharing personal sensing data, such as the content of calls and texts (1206/1600, 75.38%), in contrast to more traditional health research information. Only 13.44% (215/1600) find it desirable to grant data access to private companies, and most would like to stay informed about which projects use their data (1334/1600, 83.38%) and control future data access (1181/1600, 73.81%). Findings indicate that favorable attitudes toward digital health research repositories are related to a personal interest in health topics (odds ratio [OR] 1.49, 95% CI 1.10-2.02; P=.01), previous participation in health research studies (OR 1.70, 95% CI 1.24-2.35; P=.001), and awareness of examples of research repositories (OR 2.78, 95% CI 1.83-4.38; P<.001). CONCLUSIONS: This study reveals essential factors for acceptance and willingness to share personal data with digital health research repositories. Implications include the importance of being more transparent about the goals and beneficiaries of research projects using and re-using data from repositories, providing participants with greater autonomy for choosing who gets access to which parts of their data, and raising public awareness of the benefits of data sharing for research. In addition, future developments should engage with and reduce risks for those unwilling to participate.


Assuntos
Motivação , Opinião Pública , Adolescente , Adulto , Atitude , Estudos Transversais , Humanos , Inquéritos e Questionários , Adulto Jovem
9.
Int J Mol Sci ; 22(15)2021 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-34360575

RESUMO

Many proteins have been found to operate in a complex with various biomolecules such as proteins, nucleic acids, carbohydrates, or lipids. Protein complexes can be transient, stable or dynamic and their association is controlled under variable cellular conditions. Complexome profiling is a recently developed mass spectrometry-based method that combines mild separation techniques, native gel electrophoresis, and density gradient centrifugation with quantitative mass spectrometry to generate inventories of protein assemblies within a cell or subcellular fraction. This review summarizes applications of complexome profiling with respect to assembly ranging from single subunits to large macromolecular complexes, as well as their stability, and remodeling in health and disease.


Assuntos
Complexos Multiproteicos/química , Complexos Multiproteicos/fisiologia , Proteínas/química , Proteínas/fisiologia , Animais , Humanos
10.
J Proteome Res ; 19(10): 3906-3909, 2020 10 02.
Artigo em Inglês | MEDLINE | ID: mdl-32786688

RESUMO

Metadata is essential in proteomics data repositories and is crucial to interpret and reanalyze the deposited data sets. For every proteomics data set, we should capture at least three levels of metadata: (i) data set description, (ii) the sample to data files related information, and (iii) standard data file formats (e.g., mzIdentML, mzML, or mzTab). While the data set description and standard data file formats are supported by all ProteomeXchange partners, the information regarding the sample to data files is mostly missing. Recently, members of the European Bioinformatics Community for Mass Spectrometry (EuBIC) have created an open-source project called Sample to Data file format for Proteomics (https://github.com/bigbio/proteomics-metadata-standard/) to enable the standardization of sample metadata of public proteomics data sets. Here, the project is presented to the proteomics community, and we call for contributors, including researchers, journals, and consortiums to provide feedback about the format. We believe this work will improve reproducibility and facilitate the development of new tools dedicated to proteomics data analysis.


Assuntos
Metadados , Proteômica , Espectrometria de Massas , Reprodutibilidade dos Testes , Software
11.
Bioethics ; 2019 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-30681178

RESUMO

Health-related data uses and data sharing have been in the spotlight for a while. Since the beginning of the big data era, massive data mining and its inherent possibilities have only increased the debate about what the limits are. Data governance is a relevant aspect addressed in ethics guidelines. In this context, the European project BRIDGE Health (BRidging Information and Data Generation for Evidence-based Health policy and research) strove to achieve a comprehensive, integrated and sustainable EU health-information system. One of the aims of the project was to evaluate the requirements to construct a data-linkage infrastructure for the secure management of health information. In a blueprint provided for this infrastructure, the topics ethics and the intimately related governance occupied a whole section, where the recent ethics guidelines by the Council for International Organizations of Medical Sciences (CIOMS) and the World Medical Association (WMA) were referenced. We explore what has changed in the latest versions of the ethics documents adopted by CIOMS and WMA regarding the management of health data and human tissues, the appropriateness of their application in new forms of research and infrastructures as the proposed in the BRIDGE Health project, and whether society should be so concerned about this topic, in the digital era of social exchange.

12.
Adv Exp Med Biol ; 1137: 1-8, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31183816

RESUMO

Health and Life studies are well known for the huge amount of data they produce, such as high-throughput sequencing projects (Stephens et al., PLoS Biol 13(7):e1002195, 2015; Hey et al., The fourth paradigm: data-intensive scientific discovery, vol 1. Microsoft research Redmond, Redmond, 2009). However, the value of the data should not be measured by its amount, but instead by the possibility and ability of researchers to retrieve and process it (Leonelli, Data-centric biology: a philosophical study. University of Chicago Press, Chicago, 2016). Transparency, openness, and reproducibility are key aspects to boost the discovery of novel insights into how living systems work (Nosek et al., Science 348(6242):1422-1425, 2015).


Assuntos
Biologia Computacional , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Reprodutibilidade dos Testes
13.
J Med Syst ; 43(7): 188, 2019 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-31104150

RESUMO

In this paper, we describe a new approach to generating standardized e-Learning content from existing medical collections. The core of this approach is a tool called Clavy, which makes it possible to retrieve information items from medical collections, to transform these items into meaningful learning units, and to export them in the form of standardized e-Learning packages. In addition to describing the approach, we assess its feasibility by applying it to the generation of IMS Content Packages from MedPix, an online database of medical cases in the domain of radiology.


Assuntos
Instrução por Computador , Educação Médica , Aprendizagem , Ferramenta de Busca , Bases de Dados Factuais , Estudos de Viabilidade
14.
Int J Mol Sci ; 19(5)2018 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-29734672

RESUMO

Glioblastoma (GB) is the most aggressive brain malignancy. Although some potential glioblastoma biomarkers have already been identified, there is a lack of cell membrane-bound biomarkers capable of distinguishing brain tissue from glioblastoma and/or glioblastoma stem cells (GSC), which are responsible for the rapid post-operative tumor reoccurrence. In order to find new GB/GSC marker candidates that would be cell surface proteins (CSP), we have performed meta-analysis of genome-scale mRNA expression data from three data repositories (GEO, ArrayExpress and GLIOMASdb). The search yielded ten appropriate datasets, and three (GSE4290/GDS1962, GSE23806/GDS3885, and GLIOMASdb) were used for selection of new GB/GSC marker candidates, while the other seven (GSE4412/GDS1975, GSE4412/GDS1976, E-GEOD-52009, E-GEOD-68848, E-GEOD-16011, E-GEOD-4536, and E-GEOD-74571) were used for bioinformatic validation. The selection identified four new CSP-encoding candidate genes—CD276, FREM2, SPRY1, and SLC47A1—and the bioinformatic validation confirmed these findings. A review of the literature revealed that CD276 is not a novel candidate, while SLC47A1 had lower validation test scores than the other new candidates and was therefore not considered for experimental validation. This validation revealed that the expression of FREM2—but not SPRY1—is higher in glioblastoma cell lines when compared to non-malignant astrocytes. In addition, FREM2 gene and protein expression levels are higher in GB stem-like cell lines than in conventional glioblastoma cell lines. FREM2 is thus proposed as a novel GB biomarker and a putative biomarker of glioblastoma stem cells. Both FREM2 and SPRY1 are expressed on the surface of the GB cells, while SPRY1 alone was found overexpressed in the cytosol of non-malignant astrocytes.


Assuntos
Biomarcadores Tumorais/genética , Proteínas da Matriz Extracelular/genética , Glioblastoma/genética , Proteínas de Membrana/genética , Fosfoproteínas/genética , Astrócitos/metabolismo , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Glioblastoma/metabolismo , Glioblastoma/patologia , Humanos , Células-Tronco Neoplásicas/metabolismo , Células-Tronco Neoplásicas/patologia , Proteômica
15.
Alzheimers Dement ; 12(1): 49-54, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26318022

RESUMO

INTRODUCTION: The Global Alzheimer's Association Interactive Network (GAAIN) is consolidating the efforts of independent Alzheimer's disease data repositories around the world with the goals of revealing more insights into the causes of Alzheimer's disease, improving treatments, and designing preventative measures that delay the onset of physical symptoms. METHODS: We developed a system for federating these repositories that is reliant on the tenets that (1) its participants require incentives to join, (2) joining the network is not disruptive to existing repository systems, and (3) the data ownership rights of its members are protected. RESULTS: We are currently in various phases of recruitment with over 55 data repositories in North America, Europe, Asia, and Australia and can presently query >250,000 subjects using GAAIN's search interfaces. DISCUSSION: GAAIN's data sharing philosophy, which guided our architectural choices, is conducive to motivating membership in a voluntary data sharing network.


Assuntos
Doença de Alzheimer , Saúde Global , Disseminação de Informação , Doença de Alzheimer/etiologia , Doença de Alzheimer/prevenção & controle , Doença de Alzheimer/terapia , Pesquisa Biomédica , Comportamento Cooperativo , Bases de Dados como Assunto , Humanos
16.
J Exp Bot ; 71(2): 461-464, 2020 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-31425582
17.
aBIOTECH ; 5(1): 94-106, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38576435

RESUMO

Genomic data serve as an invaluable resource for unraveling the intricacies of the higher plant systems, including the constituent elements within and among species. Through various efforts in genomic data archiving, integrative analysis and value-added curation, the National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), has successfully established and currently maintains a vast amount of database resources. This dedicated initiative of the NGDC facilitates a data-rich ecosystem that greatly strengthens and supports genomic research efforts. Here, we present a comprehensive overview of central repositories dedicated to archiving, presenting, and sharing plant omics data, introduce knowledgebases focused on variants or gene-based functional insights, highlight species-specific multiple omics database resources, and briefly review the online application tools. We intend that this review can be used as a guide map for plant researchers wishing to select effective data resources from the NGDC for their specific areas of study. Supplementary Information: The online version contains supplementary material available at 10.1007/s42994-023-00134-4.

18.
Curr Pharm Biotechnol ; 24(7): 825-831, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35619299

RESUMO

Diseases such as cancer are often defined by dysregulation of gene expression. Noncoding RNAs (ncRNA) such as microRNAs are involved in gene expression and cell-cell communication. Many other ncRNAs exist, such as circular RNAs and small nucleolar RNAs. A wealth of knowledge is available for many ncRNAs, but the information is federated in many databases. A small number of highly complementary ncRNA databases are discussed in this work. Their relevance for cancer research is highlighted, and some of the current problems and limitations are revealed. A central or shared database enforcing community reporting and quality standards is needed in the future. • RNA-seq • Noncoding RNAs • Databases • Data repositories.


Assuntos
MicroRNAs , Neoplasias , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , MicroRNAs/genética , MicroRNAs/metabolismo , Neoplasias/genética
19.
JAMIA Open ; 6(3): ooad054, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37545984

RESUMO

Objective: To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods: The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training. Results: The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies. Discussion: Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users. Conclusion: Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure.

20.
Front Artif Intell ; 6: 1286266, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38440234

RESUMO

Neuroimaging data repositories are data-rich resources comprising brain imaging with clinical and biomarker data. The potential for such repositories to transform healthcare is tremendous, especially in their capacity to support machine learning (ML) and artificial intelligence (AI) tools. Current discussions about the generalizability of such tools in healthcare provoke concerns of risk of bias-ML models underperform in women and ethnic and racial minorities. The use of ML may exacerbate existing healthcare disparities or cause post-deployment harms. Do neuroimaging data repositories and their capacity to support ML/AI-driven clinical discoveries, have both the potential to accelerate innovative medicine and harden the gaps of social inequities in neuroscience-related healthcare? In this paper, we examined the ethical concerns of ML-driven modeling of global community neuroscience needs arising from the use of data amassed within neuroimaging data repositories. We explored this in two parts; firstly, in a theoretical experiment, we argued for a South East Asian-based repository to redress global imbalances. Within this context, we then considered the ethical framework toward the inclusion vs. exclusion of the migrant worker population, a group subject to healthcare inequities. Secondly, we created a model simulating the impact of global variations in the presentation of anosmia risks in COVID-19 toward altering brain structural findings; we then performed a mini AI ethics experiment. In this experiment, we interrogated an actual pilot dataset (n = 17; 8 non-anosmic (47%) vs. 9 anosmic (53%) using an ML clustering model. To create the COVID-19 simulation model, we bootstrapped to resample and amplify the dataset. This resulted in three hypothetical datasets: (i) matched (n = 68; 47% anosmic), (ii) predominant non-anosmic (n = 66; 73% disproportionate), and (iii) predominant anosmic (n = 66; 76% disproportionate). We found that the differing proportions of the same cohorts represented in each hypothetical dataset altered not only the relative importance of key features distinguishing between them but even the presence or absence of such features. The main objective of our mini experiment was to understand if ML/AI methodologies could be utilized toward modelling disproportionate datasets, in a manner we term "AI ethics." Further work is required to expand the approach proposed here into a reproducible strategy.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA