RESUMO
Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.
Assuntos
Inteligência Artificial , Projetos de Pesquisa , Inteligência Artificial/normas , Inteligência Artificial/tendências , Conjuntos de Dados como Assunto , Aprendizado Profundo , Projetos de Pesquisa/normas , Projetos de Pesquisa/tendências , Aprendizado de Máquina não SupervisionadoRESUMO
BACKGROUND: Adjustment for race is discouraged in lung-function testing, but the implications of adopting race-neutral equations have not been comprehensively quantified. METHODS: We obtained longitudinal data from 369,077 participants in the National Health and Nutrition Examination Survey, U.K. Biobank, the Multi-Ethnic Study of Atherosclerosis, and the Organ Procurement and Transplantation Network. Using these data, we compared the race-based 2012 Global Lung Function Initiative (GLI-2012) equations with race-neutral equations introduced in 2022 (GLI-Global). Evaluated outcomes included national projections of clinical, occupational, and financial reclassifications; individual lung-allocation scores for transplantation priority; and concordance statistics (C statistics) for clinical prediction tasks. RESULTS: Among the 249 million persons in the United States between 6 and 79 years of age who are able to produce high-quality spirometric results, the use of GLI-Global equations may reclassify ventilatory impairment for 12.5 million persons, medical impairment ratings for 8.16 million, occupational eligibility for 2.28 million, grading of chronic obstructive pulmonary disease for 2.05 million, and military disability compensation for 413,000. These potential changes differed according to race; for example, classifications of nonobstructive ventilatory impairment may change dramatically, increasing 141% (95% confidence interval [CI], 113 to 169) among Black persons and decreasing 69% (95% CI, 63 to 74) among White persons. Annual disability payments may increase by more than $1 billion among Black veterans and decrease by $0.5 billion among White veterans. GLI-2012 and GLI-Global equations had similar discriminative accuracy with regard to respiratory symptoms, health care utilization, new-onset disease, death from any cause, death related to respiratory disease, and death among persons on a transplant waiting list, with differences in C statistics ranging from -0.008 to 0.011. CONCLUSIONS: The use of race-based and race-neutral equations generated similarly accurate predictions of respiratory outcomes but assigned different disease classifications, occupational eligibility, and disability compensation for millions of persons, with effects diverging according to race. (Funded by the National Heart Lung and Blood Institute and the National Institute of Environmental Health Sciences.).
Assuntos
Testes de Função Respiratória , Insuficiência Respiratória , Adolescente , Adulto , Idoso , Criança , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem , Pneumopatias/diagnóstico , Pneumopatias/economia , Pneumopatias/etnologia , Pneumopatias/terapia , Transplante de Pulmão/estatística & dados numéricos , Inquéritos Nutricionais/estatística & dados numéricos , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Doença Pulmonar Obstrutiva Crônica/economia , Doença Pulmonar Obstrutiva Crônica/etnologia , Doença Pulmonar Obstrutiva Crônica/terapia , Grupos Raciais , Testes de Função Respiratória/classificação , Testes de Função Respiratória/economia , Testes de Função Respiratória/normas , Espirometria , Estados Unidos/epidemiologia , Insuficiência Respiratória/diagnóstico , Insuficiência Respiratória/economia , Insuficiência Respiratória/etnologia , Insuficiência Respiratória/terapia , Negro ou Afro-Americano/estatística & dados numéricos , Brancos/estatística & dados numéricos , Avaliação da Deficiência , Ajuda a Veteranos de Guerra com Deficiência/classificação , Ajuda a Veteranos de Guerra com Deficiência/economia , Ajuda a Veteranos de Guerra com Deficiência/estatística & dados numéricos , Pessoas com Deficiência/classificação , Pessoas com Deficiência/estatística & dados numéricos , Doenças Profissionais/diagnóstico , Doenças Profissionais/economia , Doenças Profissionais/etnologia , Financiamento Governamental/economia , Financiamento Governamental/estatística & dados numéricosRESUMO
Hypothesis generation in observational, biomedical data science often starts with computing an association or identifying the statistical relationship between a dependent and an independent variable. However, the outcome of this process depends fundamentally on modeling strategy, with differing strategies generating what can be called "vibration of effects" (VoE). VoE is defined by variation in associations that often lead to contradictory results. Here, we present a computational tool capable of modeling VoE in biomedical data by fitting millions of different models and comparing their output. We execute a VoE analysis on a series of widely reported associations (e.g., carrot intake associated with eyesight) with an extended additional focus on lifestyle exposures (e.g., physical activity) and components of the Framingham Risk Score for cardiovascular health (e.g., blood pressure). We leveraged our tool for potential confounder identification, investigating what adjusting variables are responsible for conflicting models. We propose modeling VoE as a critical step in navigating discovery in observational data, discerning robust associations, and cataloging adjusting variables that impact model output.
Assuntos
Ciência de Dados/métodos , Modelos Estatísticos , Estudos Observacionais como Assunto/estatística & dados numéricos , Métodos Epidemiológicos , HumanosRESUMO
The evolution of prostate cancer from an androgen-dependent state to one that is androgen-independent marks its lethal progression. The androgen receptor (AR) is essential in both, though its function in androgen-independent cancers is poorly understood. We have defined the direct AR-dependent target genes in both androgen-dependent and -independent cancer cells by generating AR-dependent gene expression profiles and AR cistromes. In contrast to what is found in androgen-dependent cells, AR selectively upregulates M-phase cell-cycle genes in androgen-independent cells, including UBE2C, a gene that inactivates the M-phase checkpoint. We find that epigenetic marks at the UBE2C enhancer, notably histone H3K4 methylation and FoxA1 transcription factor binding, are present in androgen-independent cells and direct AR-enhancer binding and UBE2C activation. Thus, the role of AR in androgen-independent cancer cells is not to direct the androgen-dependent gene expression program without androgen, but rather to execute a distinct program resulting in androgen-independent growth.
Assuntos
Regulação Neoplásica da Expressão Gênica , Neoplasias da Próstata/metabolismo , Receptores Androgênicos/metabolismo , Androgênios/metabolismo , Divisão Celular , Linhagem Celular Tumoral , Fator 3-alfa Nuclear de Hepatócito/metabolismo , Histonas/metabolismo , Humanos , Masculino , Neoplasias da Próstata/genética , Ativação Transcricional , Enzimas de Conjugação de Ubiquitina/metabolismoRESUMO
BACKGROUND: The National Kidney Foundation and American Society of Nephrology Task Force on Reassessing the Inclusion of Race in Diagnosing Kidney Disease recently recommended a new race-free creatinine-based equation for eGFR. The effect on recommended clinical care across race and ethnicity groups is unknown. METHODS: We analyzed nationally representative cross-sectional questionnaires and medical examinations from 44,360 participants collected between 2001 and 2018 by the National Health and Nutrition Examination Survey. We quantified the number and proportion of Black, White, Hispanic, and Asian/Other adults with guideline-recommended changes in care. RESULTS: The new equation, if applied nationally, could assign new CKD diagnoses to 434,000 (95% confidence interval [CI], 350,000 to 517,000) Black adults, reclassify 584,000 (95% CI, 508,000 to 667,000) to more advanced stages of CKD, restrict kidney donation eligibility for 246,000 (95% CI, 189,000 to 303,000), expand nephrologist referrals for 41,800 (95% CI, 19,800 to 63,800), and reduce medication dosing for 222,000 (95% CI, 169,000 to 275,000). Among non-Black adults, these changes may undo CKD diagnoses for 5.51 million (95% CI, 4.86 million to 6.16 million), reclassify 4.59 million (95% CI, 4.28 million to 4.92 million) to less advanced stages of CKD, expand kidney donation eligibility for 3.96 million (95% CI, 3.46 million to 4.46 million), reverse nephrologist referral for 75,800 (95% CI, 35,400 to 116,000), and reverse medication dose reductions for 1.47 million (95% CI, 1.22 million to 1.73 million). The racial and ethnic mix of the populations used to develop eGFR equations has a substantial effect on potential care changes. CONCLUSION: The newly recommended 2021 CKD-EPI creatinine-based eGFR equation may result in substantial changes to recommended care for US patients of all racial and ethnic groups.
Assuntos
Insuficiência Renal Crônica , Adulto , Humanos , Creatinina , Taxa de Filtração Glomerular , Inquéritos Nutricionais , Estudos Transversais , Insuficiência Renal Crônica/diagnósticoRESUMO
Importance: Since 2013, the American College of Cardiology (ACC) and American Heart Association (AHA) have recommended the pooled cohort equations (PCEs) for estimating the 10-year risk of atherosclerotic cardiovascular disease (ASCVD). An AHA scientific advisory group recently developed the Predicting Risk of cardiovascular disease EVENTs (PREVENT) equations, which incorporated kidney measures, removed race as an input, and improved calibration in contemporary populations. PREVENT is known to produce ASCVD risk predictions that are lower than those produced by the PCEs, but the potential clinical implications have not been quantified. Objective: To estimate the number of US adults who would experience changes in risk categorization, treatment eligibility, or clinical outcomes when applying PREVENT equations to existing ACC and AHA guidelines. Design, Setting, and Participants: Nationally representative cross-sectional sample of 7765 US adults aged 30 to 79 years who participated in the National Health and Nutrition Examination Surveys of 2011 to March 2020, which had response rates ranging from 47% to 70%. Main Outcomes and Measures: Differences in predicted 10-year ASCVD risk, ACC and AHA risk categorization, eligibility for statin or antihypertensive therapy, and projected occurrences of myocardial infarction or stroke. Results: In a nationally representative sample of 7765 US adults aged 30 to 79 years (median age, 53 years; 51.3% women), it was estimated that using PREVENT equations would reclassify approximately half of US adults to lower ACC and AHA risk categories (53.0% [95% CI, 51.2%-54.8%]) and very few US adults to higher risk categories (0.41% [95% CI, 0.25%-0.62%]). The number of US adults receiving or recommended for preventive treatment would decrease by an estimated 14.3 million (95% CI, 12.6 million-15.9 million) for statin therapy and 2.62 million (95% CI, 2.02 million-3.21 million) for antihypertensive therapy. The study estimated that, over 10 years, these decreases in treatment eligibility could result in 107â¯000 additional occurrences of myocardial infarction or stroke. Eligibility changes would affect twice as many men as women and a greater proportion of Black adults than White adults. Conclusion and Relevance: By assigning lower ASCVD risk predictions, application of the PREVENT equations to existing treatment thresholds could reduce eligibility for statin and antihypertensive therapy among 15.8 million US adults.
Assuntos
Anti-Hipertensivos , Definição da Elegibilidade , Inibidores de Hidroximetilglutaril-CoA Redutases , Infarto do Miocárdio , Prevenção Primária , Acidente Vascular Cerebral , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , American Heart Association , Anti-Hipertensivos/administração & dosagem , Anti-Hipertensivos/economia , Estudos Transversais , Definição da Elegibilidade/economia , Definição da Elegibilidade/normas , Definição da Elegibilidade/tendências , Inibidores de Hidroximetilglutaril-CoA Redutases/administração & dosagem , Inibidores de Hidroximetilglutaril-CoA Redutases/economia , Infarto do Miocárdio/prevenção & controle , Infarto do Miocárdio/epidemiologia , Inquéritos Nutricionais/estatística & dados numéricos , Guias de Prática Clínica como Assunto , Medição de Risco/normas , Acidente Vascular Cerebral/prevenção & controle , Acidente Vascular Cerebral/epidemiologia , Estados Unidos/epidemiologia , Prevenção Primária/economia , Prevenção Primária/métodos , Prevenção Primária/normasRESUMO
IMPORTANCE: The development of artificial intelligence (AI) and other machine diagnostic systems, also known as software as a medical device, and its recent introduction into clinical practice requires a deeply rooted foundation in bioethics for consideration by regulatory agencies and other stakeholders around the globe. OBJECTIVES: To initiate a dialogue on the issues to consider when developing a bioethically sound foundation for AI in medicine, based on images of eye structures, for discussion with all stakeholders. EVIDENCE REVIEW: The scope of the issues and summaries of the discussions under consideration by the Foundational Principles of Ophthalmic Imaging and Algorithmic Interpretation Working Group, as first presented during the Collaborative Community on Ophthalmic Imaging inaugural meeting on September 7, 2020, and afterward in the working group. FINDINGS: Artificial intelligence has the potential to improve health care access and patient outcome fundamentally while decreasing disparities, lowering cost, and enhancing the care team. Nevertheless, substantial concerns exist. Bioethicists, AI algorithm experts, as well as the Food and Drug Administration and other regulatory agencies, industry, patient advocacy groups, clinicians and their professional societies, other provider groups, and payors (i.e., stakeholders) working together in collaborative communities to resolve the fundamental ethical issues of nonmaleficence, autonomy, and equity are essential to attain this potential. Resolution impacts all levels of the design, validation, and implementation of AI in medicine. Design, validation, and implementation of AI warrant meticulous attention. CONCLUSIONS AND RELEVANCE: The development of a bioethically sound foundation may be possible if it is based in the fundamental ethical principles of nonmaleficence, autonomy, and equity for considerations for the design, validation, and implementation for AI systems. Achieving such a foundation will be helpful for continuing successful introduction into medicine before consideration by regulatory agencies. Important improvements in accessibility and quality of health care, decrease in health disparities, and lower cost thereby can be achieved. These considerations should be discussed with all stakeholders and expanded on as a useful initiation of this dialogue.
Assuntos
Inteligência Artificial , Diagnóstico por Imagem , Oftalmopatias/diagnóstico por imagem , Imagem Óptica , Bioética , Humanos , Software , Pesquisa Translacional BiomédicaRESUMO
BACKGROUND: Physicians sometimes consider whether or not to perform diagnostic testing in healthy people, but it is unknown whether nonextreme values of diagnostic tests typically encountered in such populations have any predictive ability, in particular for risk of death. The goal of this study was to quantify the associations among population reference intervals of 152 common biomarkers with all-cause mortality in a representative, nondiseased sample of adults in the United States. METHODS: The study used an observational cohort derived from the National Health and Nutrition Examination Survey (NHANES), a representative sample of the United States population consisting of 6 survey waves from 1999 to 2010 with linked mortality data (unweighted N = 30 651) and a median followup of 6.1 years. We deployed an X-wide association study (XWAS) approach to systematically perform association testing of 152 diagnostic tests with all-cause mortality. RESULTS: After controlling for multiple hypotheses, we found that the values within reference intervals (10-90th percentiles) of 20 common biomarkers used as diagnostic tests or clinical measures were associated with all-cause mortality, including serum albumin, red cell distribution width, serum alkaline phosphatase, and others after adjusting for age (linear and quadratic terms), sex, race, income, chronic illness, and prior-year healthcare utilization. All biomarkers combined, however, explained only an additional 0.8% of the variance of mortality risk. We found modest year-to-year changes, or changes in association from survey wave to survey wave from 1999 to 2010 in the association sizes of biomarkers. CONCLUSIONS: Reference and nonoutlying variation in common biomarkers are consistently associated with mortality risk in the US population, but their additive contribution in explaining mortality risk is minor.
Assuntos
Biomarcadores/análise , Causas de Morte , Inquéritos Nutricionais , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Valores de Referência , Estados Unidos/epidemiologiaRESUMO
While tens of thousands of pathogenic variants are used to inform the many clinical applications of genomics, there remains limited information on quantitative disease risk for the majority of variants used in clinical practice. At the same time, rising demand for genetic counselling has prompted a growing need for computational approaches that can help interpret genetic variation. Such tasks include predicting variant pathogenicity and identifying variants that are too common to be penetrant. To address these challenges, researchers are increasingly turning to integrative informatics approaches. These approaches often leverage vast sources of data, including electronic health records and population-level allele frequency databases (e.g. gnomAD), as well as machine learning techniques such as support vector machines and deep learning. In this review, we highlight recent informatics and machine learning approaches that are improving our understanding of pathogenic variation and discuss obstacles that may limit their emerging role in clinical genomics.
Assuntos
Biologia Computacional/tendências , Genoma Humano/genética , Genômica/tendências , Aprendizado de Máquina/tendências , Bases de Dados Genéticas , HumanosRESUMO
Massive data sets are often regarded as a panacea to the underpowered studies of the past. At the same time, it is becoming clear that in many of these data sets in which thousands of variables are measured across hundreds of thousands or millions of individuals, almost any desired relationship can be inferred with a suitable combination of covariates or analytic choices. Inspired by the genome-wide association study analysis paradigm that has transformed human genetics, X-wide association studies or "XWAS" have emerged as a popular approach to systematically analyzing nongenetic data sets and guarding against false positives. However, these studies often yield hundreds or thousands of associations characterized by modest effect sizes and miniscule P values. Many of these associations will be spurious and emerge due to confounding and other biases. One way of characterizing confounding in the genomics paradigm is the genomic inflation factor. An analogous "X-wide inflation factor," denoted λX, can be defined and applied to published XWAS. Effects that arise in XWAS may be prioritized using replication, triangulation, quantification of measurement error, contextualization of each effect in the distribution of all effect sizes within a field, and pre-registration. Criteria like those of Bradford Hill need to be reconsidered in light of exposure-wide epidemiology to prioritize signals among signals.
Assuntos
Big Data , Bioestatística/métodos , Interpretação Estatística de Dados , Projetos de Pesquisa Epidemiológica , Aprendizado de Máquina , Fatores de Confusão Epidemiológicos , Humanos , Modelos EstatísticosRESUMO
BACKGROUND: For more than a decade, risk stratification for hypertrophic cardiomyopathy has been enhanced by targeted genetic testing. Using sequencing results, clinicians routinely assess the risk of hypertrophic cardiomyopathy in a patient's relatives and diagnose the condition in patients who have ambiguous clinical presentations. However, the benefits of genetic testing come with the risk that variants may be misclassified. METHODS: Using publicly accessible exome data, we identified variants that have previously been considered causal in hypertrophic cardiomyopathy and that are overrepresented in the general population. We studied these variants in diverse populations and reevaluated their initial ascertainments in the medical literature. We reviewed patient records at a leading genetic-testing laboratory for occurrences of these variants during the near-decade-long history of the laboratory. RESULTS: Multiple patients, all of whom were of African or unspecified ancestry, received positive reports, with variants misclassified as pathogenic on the basis of the understanding at the time of testing. Subsequently, all reported variants were recategorized as benign. The mutations that were most common in the general population were significantly more common among black Americans than among white Americans (P<0.001). Simulations showed that the inclusion of even small numbers of black Americans in control cohorts probably would have prevented these misclassifications. We identified methodologic shortcomings that contributed to these errors in the medical literature. CONCLUSIONS: The misclassification of benign variants as pathogenic that we found in our study shows the need for sequencing the genomes of diverse populations, both in asymptomatic controls and the tested patient population. These results expand on current guidelines, which recommend the use of ancestry-matched controls to interpret variants. As additional populations of different ancestry backgrounds are sequenced, we expect variant reclassifications to increase, particularly for ancestry groups that have historically been less well studied. (Funded by the National Institutes of Health.).
Assuntos
Negro ou Afro-Americano/genética , Cardiomiopatia Hipertrófica/genética , Reações Falso-Positivas , Predisposição Genética para Doença , Variação Genética , Adolescente , Adulto , Idoso , Asiático/genética , Criança , Exoma , Testes Genéticos , Genótipo , Disparidades nos Níveis de Saúde , Hispânico ou Latino/genética , Humanos , Pessoa de Meia-Idade , Mutação , Análise de Sequência de DNA , Estados Unidos , População Branca/genética , Adulto JovemRESUMO
PurposeIntegrating genomic sequencing in clinical care requires standardization of variant interpretation practices. The Clinical Genome Resource has established expert panels to adapt the American College of Medical Genetics and Genomics/Association for Molecular Pathology classification framework for specific genes and diseases. The Cardiomyopathy Expert Panel selected MYH7, a key contributor to inherited cardiomyopathies, as a pilot gene to develop a broadly applicable approach.MethodsExpert revisions were tested with 60 variants using a structured double review by pairs of clinical and diagnostic laboratory experts. Final consensus rules were established via iterative discussions.ResultsAdjustments represented disease-/gene-informed specifications (12) or strength adjustments of existing rules (5). Nine rules were deemed not applicable. Key specifications included quantitative frameworks for minor allele frequency thresholds, the use of segregation data, and a semiquantitative approach to counting multiple independent variant occurrences where fully controlled case-control studies are lacking. Initial inter-expert classification concordance was 93%. Internal data from participating diagnostic laboratories changed the classification of 20% of the variants (n = 12), highlighting the critical importance of data sharing.ConclusionThese adapted rules provide increased specificity for use in MYH7-associated disorders in combination with expert review and clinical judgment and serve as a stepping stone for genes and disorders with similar genetic and clinical characteristics.
Assuntos
Miosinas Cardíacas/genética , Cardiomiopatias/diagnóstico , Cardiomiopatias/genética , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Variação Genética , Cadeias Pesadas de Miosina/genética , Alelos , Tomada de Decisão Clínica , Prova Pericial , Frequência do Gene , Testes Genéticos/métodos , Testes Genéticos/normas , Humanos , Fenótipo , Reprodutibilidade dos TestesRESUMO
The complexity of the human exposome-the totality of environmental exposures encountered from birth to death-motivates systematic, high-throughput approaches to discover new environmental determinants of disease. In this review, we describe the state of science in analyzing the human exposome and provide recommendations for the public health community to consider in dealing with analytic challenges of exposome-based biomedical research. We describe extant and novel analytic methods needed to associate the exposome with critical health outcomes and contextualize the data-centered challenges by drawing parallels to other research endeavors such as human genomics research. We discuss efforts for training scientists who can bridge public health, genomics, and biomedicine in informatics and statistics. If an exposome data ecosystem is brought to fruition, it will likely play a role as central as genomic science has had in molding the current and new generations of biomedical researchers, computational scientists, and public health research programs.