Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
J Chem Phys ; 160(20)2024 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-38804486

RESUMO

The melting temperature is important for materials design because of its relationship with thermal stability, synthesis, and processing conditions. Current empirical and computational melting point estimation techniques are limited in scope, computational feasibility, or interpretability. We report the development of a machine learning methodology for predicting melting temperatures of binary ionic solid materials. We evaluated different machine-learning models trained on a dataset of the melting points of 476 non-metallic crystalline binary compounds using materials embeddings constructed from elemental properties and density-functional theory calculations as model inputs. A direct supervised-learning approach yields a mean absolute error of around 180 K but suffers from low interpretability. We find that the fidelity of predictions can further be improved by introducing an additional unsupervised-learning step that first classifies the materials before the melting-point regression. Not only does this two-step model exhibit improved accuracy, but the approach also provides a level of interpretability with insights into feature importance and different types of melting that depend on the specific atomic bonding inside a material. Motivated by this finding, we used a symbolic learning approach to find interpretable physical models for the melting temperature, which recovered the best-performing features from both prior models and provided additional interpretability.

2.
JAMA Netw Open ; 6(7): e2324969, 2023 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-37523187

RESUMO

Importance: Limited data describe the health status of sexual or gender minority (SGM) people due to inaccurate and inconsistent ascertainment of gender identity, sex assigned at birth, and sexual orientation. Objective: To evaluate whether the prevalence of 12 health conditions is higher among SGM adults in the All of Us Research Program data compared with cisgender heterosexual (non-SGM) people. Design, Setting, and Participants: This cross-sectional study used data from a multidisciplinary research consortium, the All of Us Research Program, that links participant-reported survey information to electronic health records (EHR) and physical measurements. In total, 372 082 US adults recruited and enrolled at an All of Us health care provider organization or by directly visiting the enrollment website from May 31, 2017, to January 1, 2022, and were assessed for study eligibility. Exposures: Self-identified gender identity and sexual orientation group. Main Outcomes and Measures: Twelve health conditions were evaluated: 11 using EHR data and 1, body mass index (BMI; calculated as weight in kilograms divided by height in meters squared), using participants' physical measurements. Logistic regression (adjusting for age, income, and employment, enrollment year, and US Census division) was used to obtain adjusted odds ratios (AORs) for the associations between each SGM group and health condition compared with a non-SGM reference group. Results: The analytic sample included 346 868 participants (median [IQR] age, 55 [39-68] years; 30 763 [8.9%] self-identified as SGM). Among participants with available BMI (80.2%) and EHR data (69.4%), SGM groups had higher odds of anxiety, depression, HIV diagnosis, and tobacco use disorder but lower odds of cardiovascular disease, kidney disease, diabetes, and hypertension. Estimated associations for asthma (AOR, 0.39 [95% CI, 0.24-0.63] for gender diverse people assigned male at birth; AOR, 0.51 [95% CI, 0.38-0.69] for transgender women), a BMI of 25 or higher (AOR, 1.65 [95% CI, 1.38-1.96] for transgender men), cancer (AOR, 1.15 [95% CI, 1.07-1.23] for cisgender sexual minority men; AOR, 0.88 [95% CI, 0.81-0.95] for cisgender sexual minority women), and substance use disorder (AOR, 0.35 [95% CI, 0.24-0.52] for gender diverse people assigned female at birth; AOR, 0.65 [95% CI, 0.49-0.87] for transgender men) varied substantially across SGM groups compared with non-SGM groups. Conclusions and Relevance: In this cross-sectional analysis of data from the All of Us Research Program, SGM participants experienced health inequities that varied by group and condition. The All of Us Research Program can be a valuable resource for conducting health research focused on SGM people.


Assuntos
Saúde da População , Minorias Sexuais e de Gênero , Adulto , Recém-Nascido , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Identidade de Gênero , Estudos Transversais , Prevalência , Comportamento Sexual
3.
Sci Rep ; 13(1): 8476, 2023 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-37231056

RESUMO

We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as 'single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects. In this paper, we propose a computationally efficient model selection approach-based on the e-values framework-for single SNP detection in families while utilizing information on multiple SNPs simultaneously. To overcome computational bottleneck of traditional model selection methods, our method trains one single model, and utilizes a fast and scalable bootstrap procedure. We illustrate through numerical studies that our proposed method is more effective in detecting SNPs associated with a trait than either single-marker analysis using family data or model selection methods that ignore the familial dependency structure. Further, we perform gene-level analysis in Minnesota Center for Twin and Family Research (MCTFR) dataset using our method to detect several SNPs using this that have been implicated to be associated with alcohol consumption.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Consumo de Bebidas Alcoólicas , Minnesota , Modelos Genéticos
4.
Entropy (Basel) ; 24(8)2022 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-36010800

RESUMO

We present an overview of four challenging research areas in multiscale physics and engineering as well as four data science topics that may be developed for addressing these challenges. We focus on multiscale spatiotemporal problems in light of the importance of understanding the accompanying scientific processes and engineering ideas, where "multiscale" refers to concurrent, non-trivial and coupled models over scales separated by orders of magnitude in either space, time, energy, momenta, or any other relevant parameter. Specifically, we consider problems where the data may be obtained at various resolutions; analyzing such data and constructing coupled models led to open research questions in various applications of data science. Numeric studies are reported for one of the data science techniques discussed here for illustration, namely, on approximate Bayesian computations.

5.
Entropy (Basel) ; 23(11)2021 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-34828244

RESUMO

With the advent of big data and the popularity of black-box deep learning methods, it is imperative to address the robustness of neural networks to noise and outliers. We propose the use of Winsorization to recover model performances when the data may have outliers and other aberrant observations. We provide a comparative analysis of several probabilistic artificial intelligence and machine learning techniques for supervised learning case studies. Broadly, Winsorization is a versatile technique for accounting for outliers in data. However, different probabilistic machine learning techniques have different levels of efficiency when used on outlier-prone data, with or without Winsorization. We notice that Gaussian processes are extremely vulnerable to outliers, while deep learning techniques in general are more robust.

6.
Nat Commun ; 12(1): 3411, 2021 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-34099683

RESUMO

Tree-ring chronologies underpin the majority of annually-resolved reconstructions of Common Era climate. However, they are derived using different datasets and techniques, the ramifications of which have hitherto been little explored. Here, we report the results of a double-blind experiment that yielded 15 Northern Hemisphere summer temperature reconstructions from a common network of regional tree-ring width datasets. Taken together as an ensemble, the Common Era reconstruction mean correlates with instrumental temperatures from 1794-2016 CE at 0.79 (p < 0.001), reveals summer cooling in the years following large volcanic eruptions, and exhibits strong warming since the 1980s. Differing in their mean, variance, amplitude, sensitivity, and persistence, the ensemble members demonstrate the influence of subjectivity in the reconstruction process. We therefore recommend the routine use of ensemble reconstruction approaches to provide a more consensual picture of past climate variability.

7.
Sci Rep ; 10(1): 10299, 2020 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-32581227

RESUMO

Earth System Models (ESMs) are the state of the art for projecting the effects of climate change. However, longstanding uncertainties in their ability to simulate regional and local precipitation extremes and related processes inhibit decision making. Existing state-of-the art approaches for uncertainty quantification use Bayesian methods to weight ESMs based on a balance of historical skills and future consensus. Here we propose an empirical Bayesian model that extends an existing skill and consensus based weighting framework and examine the hypothesis that nontrivial, physics-guided measures of ESM skill can help produce reliable probabilistic characterization of climate extremes. Specifically, the model leverages knowledge of physical relationships between temperature, atmospheric moisture capacity, and extreme precipitation intensity to iteratively weight and combine ESMs and estimate probability distributions of return levels. Out-of-sample validation suggests that the proposed Bayesian method, which incorporates physics-guidance, has the potential to derive reliable precipitation projections, although caveats remain and the gain is not uniform across all cases.

8.
PLoS One ; 14(5): e0217148, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31150427

RESUMO

Crop yields are projected to decrease under future climate conditions, and recent research suggests that yields have already been impacted. However, current impacts on a diversity of crops subnationally and implications for food security remains unclear. Here, we constructed linear regression relationships using weather and reported crop data to assess the potential impact of observed climate change on the yields of the top ten global crops-barley, cassava, maize, oil palm, rapeseed, rice, sorghum, soybean, sugarcane and wheat at ~20,000 political units. We find that the impact of global climate change on yields of different crops from climate trends ranged from -13.4% (oil palm) to 3.5% (soybean). Our results show that impacts are mostly negative in Europe, Southern Africa and Australia but generally positive in Latin America. Impacts in Asia and Northern and Central America are mixed. This has likely led to ~1% average reduction (-3.5 X 1013 kcal/year) in consumable food calories in these ten crops. In nearly half of food insecure countries, estimated caloric availability decreased. Our results suggest that climate change has already affected global food production.


Assuntos
Irrigação Agrícola/tendências , Mudança Climática , Produção Agrícola/tendências , Produtos Agrícolas/crescimento & desenvolvimento , Abastecimento de Alimentos , Saúde Global
9.
BMC Genet ; 18(1): 70, 2017 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-28738830

RESUMO

BACKGROUND: Genome-wide association studies involve detecting association between millions of genetic variants and a trait, which typically use univariate regression to test association between each single variant and the phenotype. Alternatively, Lasso penalized regression allows one to jointly model the relationship between all genetic variants and the phenotype. However, it is unclear how to best conduct inference on the individual Lasso coefficients, especially in high-dimensional settings. METHODS: We consider six methods for testing the Lasso coefficients: two permutation (Lasso-Ayers, Lasso-PL) and one analytic approach (Lasso-AL) to select the penalty parameter for type-1-error control, residual bootstrap (Lasso-RB), modified residual bootstrap (Lasso-MRB), and a permutation test (Lasso-PT). Methods are compared via simulations and application to the Minnesota Center for Twins and Family Study. RESULTS: We show that for finite sample sizes with increasing number of null predictors, Lasso-RB, Lasso-MRB, and Lasso-PT fail to be viable methods of inference. However, Lasso-PL and Lasso-AL remain fast and powerful tools for conducting inference with the Lasso, even in high-dimensions. CONCLUSION: Our results suggest that the proposed permutation selection procedure (Lasso-PL) and the analytic selection method (Lasso-AL) are fast and powerful alternatives to the standard univariate analysis in genome-wide association studies.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Teorema de Bayes , Simulação por Computador , Marcadores Genéticos , Humanos , Fenótipo , Estudos de Amostragem , Estudos em Gêmeos como Assunto
10.
PLoS One ; 12(5): e0176853, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28472093

RESUMO

This empirical study sheds light on the spatial correlation of traffic links under different traffic regimes. We mimic the behavior of real traffic by pinpointing the spatial correlation between 140 freeway traffic links in a major sub-network of the Minneapolis-St. Paul freeway system with a grid-like network topology. This topology enables us to juxtapose the positive and negative correlation between links, which has been overlooked in short-term traffic forecasting models. To accurately and reliably measure the correlation between traffic links, we develop an algorithm that eliminates temporal trends in three dimensions: (1) hourly dimension, (2) weekly dimension, and (3) system dimension for each link. The spatial correlation of traffic links exhibits a stronger negative correlation in rush hours, when congestion affects route choice. Although this correlation occurs mostly in parallel links, it is also observed upstream, where travelers receive information and are able to switch to substitute paths. Irrespective of the time-of-day and day-of-week, a strong positive correlation is witnessed between upstream and downstream links. This correlation is stronger in uncongested regimes, as traffic flow passes through consecutive links more quickly and there is no congestion effect to shift or stall traffic. The extracted spatial correlation structure can augment the accuracy of short-term traffic forecasting models.


Assuntos
Condução de Veículo , Humanos , Minnesota
11.
Public Health Nurs ; 32(2): 94-100, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25040680

RESUMO

OBJECTIVES: To evaluate variability in health literacy outcomes due to home visiting (HV) program components including PHN, Intervention, and Client. DESIGN AND SAMPLE: A comparative, correlational study evaluated PHN home visiting program data that included PHNs (N = 16); Interventions (N = 21,634); and Clients (N = 141). Client age ranged from 14 to 46 (median = 21, mean = 22.8, SD = 6.65). Clients were predominately White (75.9%), not married (84.4%), and female (99.3%). PHNS documented care using electronic health records (EHR) and the Omaha System. MEASURES: The outcome of interest was health literacy benchmark attainment (adequate knowledge) operationalized by Omaha System Problem Rating Scale for Outcomes Knowledge scores averaged across problems. INTERVENTION: Program of individually tailored, evidence-based HV interventions provided by PHNs. RESULTS: There were 233 different interventions for 22 problems. Knowledge benchmark was attained by 16.3% of clients. Four factors explained variance in reaching the knowledge benchmark: Client (51%), Problem (17%), Intervention (16%), and PHN (16%). CONCLUSIONS: The PHN and intervention tailoring are actionable components of HV programs that explain variability in health literacy outcomes. Further research should examine effects of training on PHN relationship skills and intervention tailoring to optimize outcomes of evidence-based PHN HV programs, and to evaluate whether improving health literacy may subsequently improve client problems.


Assuntos
Enfermagem em Saúde Comunitária/educação , Enfermagem em Saúde Comunitária/tendências , Educação de Pós-Graduação em Enfermagem/tendências , Papel do Profissional de Enfermagem , Enfermagem em Saúde Pública/educação , Enfermagem em Saúde Pública/tendências , Humanos
12.
J Multivar Anal ; 102(4): 768-780, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26617421

RESUMO

High dimensional data routinely arises in image analysis, genetic experiments, network analysis, and various other research areas. Many such datasets do not correspond to well-studied probability distributions, and in several applications the data-cloud prominently displays non-symmetric and non-convex shape features. We propose using spatial quantiles and their generalizations, in particular, the projection quantile, for describing, analyzing and conducting inference with multivariate data. Minimal assumptions are made about the nature and shape characteristics of the underlying probability distribution, and we do not require the sample size to be as high as the data-dimension. We present theoretical properties of the generalized spatial quantiles, and an algorithm to compute them quickly. Our quantiles may be used to obtain multidimensional confidence or credible regions that are not required to conform to a pre-determined shape. We also propose a new notion of multidimensional order statistics, which may be used to obtain multidimensional outliers. Many of the features revealed using a generalized spatial quantile-based analysis would be missed if the data was shoehorned into a well-known probabilistic configuration.

13.
Bioinformatics ; 23(4): 442-9, 2007 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-17158516

RESUMO

MOTIVATION: Interaction among time series can be explored in many ways. All the approach has the usual problem of low power and high dimensional model. Here we attempted to build a causality network among a set of time series. The causality has been established by Granger causality, and then constructing the pathway has been implemented by finding the Minimal Spanning Tree within each connected component of the inferred network. False discovery rate measurement has been used to identify the most significant causalities. RESULTS: Simulation shows good convergence and accuracy of the algorithm. Robustness of the procedure has been demonstrated by applying the algorithm in a non-stationary time series setup. Application of the algorithm in a real dataset identified many causalities, with some overlap with previously known ones. Assembled network of the genes reveals features of the network that are common wisdom about naturally occurring networks.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...