Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 390
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Methods ; 230: 99-107, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39097178

RESUMO

Many studies have demonstrated the importance of accurately identifying miRNA-disease associations (MDAs) for understanding disease mechanisms. However, the number of known MDAs is significantly fewer than the unknown pairs. Here, we propose RSANMDA, a subview attention network for predicting MDAs. We first extract miRNA and disease features from multiple similarity matrices. Next, using resampling techniques, we generate different subviews from known MDAs. Each subview undergoes multi-head graph attention to capture its features, followed by semantic attention to integrate features across subviews. Finally, combining raw and training features, we use a multilayer scoring perceptron for prediction. In the experimental section, we conducted comparative experiments with other advanced models on both HMDD v2.0 and HMDD v3.2 datasets. We also performed a series of ablation studies and parameter tuning exercises. Comprehensive experiments conclusively demonstrate the superiority of our model. Case studies on lung, breast, and esophageal cancers further validate our method's predictive capability for identifying disease-related miRNAs.


Assuntos
MicroRNAs , Humanos , MicroRNAs/genética , Biologia Computacional/métodos , Predisposição Genética para Doença , Redes Neurais de Computação , Neoplasias da Mama/genética , Neoplasias Pulmonares/genética , Algoritmos , Neoplasias/genética , Neoplasias Esofágicas/genética
2.
Proc Biol Sci ; 291(2018): 20240079, 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38471547

RESUMO

The fast rate of replacement of natural areas by expanding cities is a key threat to wildlife worldwide. Many wild species occur in cities, yet little is known on the dynamics of urban wildlife assemblages due to species' extinction and colonization that may occur in response to the rapidly evolving conditions within urban areas. Namely, species' ability to spread within urban areas, besides habitat preferences, is likely to shape the fate of species once they occur in a city. Here we use a long-term dataset on mammals occurring in one of the largest and most ancient cities in Europe to assess whether and how spatial spread and association with specific habitats drive the probability of local extinction within cities. Our analysis included mammalian records dating between years 1832 and 2023, and revealed that local extinctions in urban areas are biased towards species associated with wetlands and that were naturally rare within the city. Besides highlighting the role of wetlands within urban areas for conserving wildlife, our work also highlights the importance of long-term biodiversity monitoring in highly dynamic habitats such as cities, as a key asset to better understand wildlife trends and thus foster more sustainable and biodiversity-friendly cities.


Assuntos
Ecossistema , Áreas Alagadas , Animais , Cidades , Mamíferos , Biodiversidade , Animais Selvagens
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34929742

RESUMO

MOTIVATION: Accumulating evidences have indicated that microRNA (miRNA) plays a crucial role in the pathogenesis and progression of various complex diseases. Inferring disease-associated miRNAs is significant to explore the etiology, diagnosis and treatment of human diseases. As the biological experiments are time-consuming and labor-intensive, developing effective computational methods has become indispensable to identify associations between miRNAs and diseases. RESULTS: We present an Ensemble learning framework with Resampling method for MiRNA-Disease Association (ERMDA) prediction to discover potential disease-related miRNAs. Firstly, the resampling strategy is proposed for building multiple different balanced training subsets to address the challenge of sample imbalance within the database. Then, ERMDA extracts miRNA and disease feature representations by integrating miRNA-miRNA similarities, disease-disease similarities and experimentally verified miRNA-disease association information. Next, the feature selection approach is applied to reduce the redundant information and increase the diversity among these subsets. Lastly, ERMDA constructs an individual learner on each subset to yield primitive outcomes, and the soft voting method is introduced for making the final decision based on the prediction results of individual learners. A series of experimental results demonstrates that ERMDA outperforms other state-of-the-art methods on both balanced and unbalanced testing sets. Besides, case studies conducted on the three human diseases further confirm the ERMDA's prediction capability for identifying potential disease-related miRNAs. In conclusion, these experimental results demonstrate that our method can serve as an effective and reliable tool for researchers to explore the regulatory role of miRNAs in complex diseases.


Assuntos
Doença/genética , Estudos de Associação Genética , Aprendizado de Máquina , MicroRNAs/genética , Algoritmos , Biologia Computacional , Predisposição Genética para Doença/genética , Humanos
4.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35953081

RESUMO

Posttranslational modification of lysine residues, K-PTM, is one of the most popular PTMs. Some lysine residues in proteins can be continuously or cascaded covalently modified, such as acetylation, crotonylation, methylation and succinylation modification. The covalent modification of lysine residues may have some special functions in basic research and drug development. Although many computational methods have been developed to predict lysine PTMs, up to now, the K-PTM prediction methods have been modeled and learned a single class of K-PTM modification. In view of this, this study aims to fill this gap by building a multi-label computational model that can be directly used to predict multiple K-PTMs in proteins. In this study, a multi-label prediction model, MLysPRED, is proposed to identify multiple lysine sites using features generated from human protein sequences. In MLysPRED, three kinds of multi-label sequence encoding algorithms (MLDBPB, MLPSDAAP, MLPSTAAP) are proposed and combined with three encoding strategies (CHHAA, DR and Kmer) to convert preprocessed lysine sequences into effective numerical features. A multidimensional normal distribution oversampling technique and graph-based multi-view clustering under-sampling algorithm were first proposed and incorporated to reduce the proportion of the original training samples, and multi-label nearest neighbor algorithm is used for classification. It is observed that MLysPRED achieved an Aiming of 92.21%, Coverage of 94.98%, Accuracy of 89.63%, Absolute-True of 81.46% and Absolute-False of 0.0682 on the independent datasets. Additionally, comparison of results with five existing predictors also indicated that MLysPRED is very promising and encouraging to predict multiple K-PTMs in proteins. For the convenience of the experimental scientists, 'MLysPRED' has been deployed as a user-friendly web-server at http://47.100.136.41:8181.


Assuntos
Lisina , Proteínas , Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Humanos , Lisina/metabolismo , Distribuição Normal , Processamento de Proteína Pós-Traducional , Proteínas/química
5.
Stat Med ; 43(14): 2783-2810, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38705726

RESUMO

Propensity score matching is commonly used to draw causal inference from observational survival data. However, its asymptotic properties have yet to be established, and variance estimation is still open to debate. We derive the statistical properties of the propensity score matching estimator of the marginal causal hazard ratio based on matching with replacement and a fixed number of matches. We also propose a double-resampling technique for variance estimation that takes into account the uncertainty due to propensity score estimation prior to matching.


Assuntos
Pontuação de Propensão , Modelos de Riscos Proporcionais , Humanos , Análise de Sobrevida , Causalidade , Simulação por Computador , Estudos Observacionais como Assunto/estatística & dados numéricos , Modelos Estatísticos
6.
Stat Med ; 43(9): 1804-1825, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38356231

RESUMO

Statistical data simulation is essential in the development of statistical models and methods as well as in their performance evaluation. To capture complex data structures, in particular for high-dimensional data, a variety of simulation approaches have been introduced including parametric and the so-called plasmode simulations. While there are concerns about the realism of parametrically simulated data, it is widely claimed that plasmodes come very close to reality with some aspects of the "truth" known. However, there are no explicit guidelines or state-of-the-art on how to perform plasmode data simulations. In the present paper, we first review existing literature and introduce the concept of statistical plasmode simulation. We then discuss advantages and challenges of statistical plasmodes and provide a step-wise procedure for their generation, including key steps to their implementation and reporting. Finally, we illustrate the concept of statistical plasmodes as well as the proposed plasmode generation procedure by means of a public real RNA data set on breast carcinoma patients.


Assuntos
Modelos Estatísticos , Humanos , Simulação por Computador
7.
Stat Med ; 43(10): 1849-1866, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38402907

RESUMO

Several methods in survival analysis are based on the proportional hazards assumption. However, this assumption is very restrictive and often not justifiable in practice. Therefore, effect estimands that do not rely on the proportional hazards assumption are highly desirable in practical applications. One popular example for this is the restricted mean survival time (RMST). It is defined as the area under the survival curve up to a prespecified time point and, thus, summarizes the survival curve into a meaningful estimand. For two-sample comparisons based on the RMST, previous research found the inflation of the type I error of the asymptotic test for small samples and, therefore, a two-sample permutation test has already been developed. The first goal of the present paper is to further extend the permutation test for general factorial designs and general contrast hypotheses by considering a Wald-type test statistic and its asymptotic behavior. Additionally, a groupwise bootstrap approach is considered. Moreover, when a global test detects a significant difference by comparing the RMSTs of more than two groups, it is of interest which specific RMST differences cause the result. However, global tests do not provide this information. Therefore, multiple tests for the RMST are developed in a second step to infer several null hypotheses simultaneously. Hereby, the asymptotically exact dependence structure between the local test statistics is incorporated to gain more power. Finally, the small sample performance of the proposed global and multiple testing procedures is analyzed in simulations and illustrated in a real data example.


Assuntos
Projetos de Pesquisa , Humanos , Taxa de Sobrevida , Análise de Sobrevida , Modelos de Riscos Proporcionais
8.
BMC Med Res Methodol ; 24(1): 189, 2024 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-39210285

RESUMO

BACKGROUND: Accurate prediction of subject recruitment, which is critical to the success of a study, remains an ongoing challenge. Previous prediction models often rely on parametric assumptions which are not always met or may be difficult to implement. We aim to develop a novel method that is less sensitive to model assumptions and relatively easy to implement. METHODS: We create a weighted resampling-based approach to predict enrollment in year two based on recruitment data from year one of the completed GRIPS and PACE clinical trials. Different weight functions accounted for a range of potential enrollment trajectory patterns. Prediction accuracy was measured by Euclidean distance for enrollment sequence in year two, total enrollment over time, and total weeks to enroll a fixed number of subjects, against the actual year two enrollment data. We compare the performance of the proposed method with an existing Bayesian method. RESULTS: Weighted resampling using GRIPS data resulted in closer prediction evidenced by better coverage of observed enrollment with the prediction intervals and smaller Euclidean distance from actual enrollment in year 2, especially when enrollment gaps were filled prior to the weighted resampling. These scenarios also produced more accurate predictions for total enrollment and number of weeks to enroll 50 participants. These same scenarios outperformed an existing Bayesian method for all 3 accuracy measures. In PACE data, using a reduced year 1 enrollment resulted in closer prediction evidenced by better coverage of observed enrollment with the prediction intervals and smaller Euclidean distance from actual enrollment in year 2, with the weighted resampling scenarios better reflecting the seasonal variation seen in year (1) The reduced enrollment scenarios resulted in closer prediction for total enrollment over 6 and 12 months into year (2) These same scenarios also outperformed an existing Bayesian method for relevant accuracy measures. CONCLUSION: The results demonstrate the feasibility and flexibility for a resampling-based, non-parametric approach for prediction of clinical trial recruitment with limited early enrollment data. Application to a wider setting and long-term prediction accuracy require further investigation.


Assuntos
Teorema de Bayes , Seleção de Pacientes , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Idoso , Pacientes Internados/estatística & dados numéricos , Estatísticas não Paramétricas , Feminino
9.
BMC Health Serv Res ; 24(1): 37, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38183029

RESUMO

BACKGROUND: No-show to medical appointments has significant adverse effects on healthcare systems and their clients. Using machine learning to predict no-shows allows managers to implement strategies such as overbooking and reminders targeting patients most likely to miss appointments, optimizing the use of resources. METHODS: In this study, we proposed a detailed analytical framework for predicting no-shows while addressing imbalanced datasets. The framework includes a novel use of z-fold cross-validation performed twice during the modeling process to improve model robustness and generalization. We also introduce Symbolic Regression (SR) as a classification algorithm and Instance Hardness Threshold (IHT) as a resampling technique and compared their performance with that of other classification algorithms, such as K-Nearest Neighbors (KNN) and Support Vector Machine (SVM), and resampling techniques, such as Random under Sampling (RUS), Synthetic Minority Oversampling Technique (SMOTE) and NearMiss-1. We validated the framework using two attendance datasets from Brazilian hospitals with no-show rates of 6.65% and 19.03%. RESULTS: From the academic perspective, our study is the first to propose using SR and IHT to predict the no-show of patients. Our findings indicate that SR and IHT presented superior performances compared to other techniques, particularly IHT, which excelled when combined with all classification algorithms and led to low variability in performance metrics results. Our results also outperformed sensitivity outcomes reported in the literature, with values above 0.94 for both datasets. CONCLUSION: This is the first study to use SR and IHT methods to predict patient no-shows and the first to propose performing z-fold cross-validation twice. Our study highlights the importance of avoiding relying on few validation runs for imbalanced datasets as it may lead to biased results and inadequate analysis of the generalization and stability of the models obtained during the training stage.


Assuntos
Algoritmos , Benchmarking , Humanos , Brasil , Aprendizado de Máquina , Técnicas de Apoio para a Decisão
10.
Sensors (Basel) ; 24(12)2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38931742

RESUMO

Corn (Zea mays L.) is the most abundant food/feed crop, making accurate yield estimation a critical data point for monitoring global food production. Sensors with varying spatial/spectral configurations have been used to develop corn yield models from intra-field (0.1 m ground sample distance (GSD)) to regional scales (>250 m GSD). Understanding the spatial and spectral dependencies of these models is imperative to result interpretation, scaling, and deploying models. We leveraged high spatial resolution hyperspectral data collected with an unmanned aerial system mounted sensor (272 spectral bands from 0.4-1 µm at 0.063 m GSD) to estimate silage yield. We subjected our imagery to three band selection algorithms to quantitatively assess spectral reflectance features applicability to yield estimation. We then derived 11 spectral configurations, which were spatially resampled to multiple GSDs, and applied to a support vector regression (SVR) yield estimation model. Results indicate that accuracy degrades above 4 m GSD across all configurations, and a seven-band multispectral sensor which samples the red edge and multiple near-infrared bands resulted in higher accuracy in 90% of regression trials. These results bode well for our quest toward a definitive sensor definition for global corn yield modeling, with only temporal dependencies requiring additional investigation.

11.
Behav Res Methods ; 56(2): 750-764, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36814007

RESUMO

Mediation analysis in repeated measures studies can shed light on the mechanisms through which experimental manipulations change the outcome variable. However, the literature on interval estimation for the indirect effect in the 1-1-1 single mediator model is sparse. Most simulation studies to date evaluating mediation analysis in multilevel data considered scenarios that do not match the expected numbers of level 1 and level 2 units typically encountered in experimental studies, and no study to date has compared resampling and Bayesian methods for constructing intervals for the indirect effect in this context. We conducted a simulation study to compare statistical properties of interval estimates of the indirect effect obtained using four bootstrap and two Bayesian methods in the 1-1-1 mediation model with and without random effects. Bayesian credibility intervals had coverage closest to the nominal value and no instances of excessive Type I error rates, but lower power than resampling methods. Findings indicated that the pattern of performance for resampling methods often depended on the presence of random effects. We provide suggestions for selecting an interval estimator for the indirect effect depending on the most important statistical property for a given study, as well as code in R for implementing all methods evaluated in the simulation study. Findings and code from this project will hopefully support the use of mediation analysis in experimental research with repeated measures.


Assuntos
Análise de Mediação , Modelos Estatísticos , Humanos , Teorema de Bayes , Simulação por Computador , Análise Multinível
12.
Entropy (Basel) ; 26(3)2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38539740

RESUMO

The knowledge of the causal mechanisms underlying one single system may not be sufficient to answer certain questions. One can gain additional insights from comparing and contrasting the causal mechanisms underlying multiple systems and uncovering consistent and distinct causal relationships. For example, discovering common molecular mechanisms among different diseases can lead to drug repurposing. The problem of comparing causal mechanisms among multiple systems is non-trivial, since the causal mechanisms are usually unknown and need to be estimated from data. If we estimate the causal mechanisms from data generated from different systems and directly compare them (the naive method), the result can be sub-optimal. This is especially true if the data generated by the different systems differ substantially with respect to their sample sizes. In this case, the quality of the estimated causal mechanisms for the different systems will differ, which can in turn affect the accuracy of the estimated similarities and differences among the systems via the naive method. To mitigate this problem, we introduced the bootstrap estimation and the equal sample size resampling estimation method for estimating the difference between causal networks. Both of these methods use resampling to assess the confidence of the estimation. We compared these methods with the naive method in a set of systematically simulated experimental conditions with a variety of network structures and sample sizes, and using different performance metrics. We also evaluated these methods on various real-world biomedical datasets covering a wide range of data designs.

13.
Stud Hist Philos Sci ; 107: 1-10, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39106538

RESUMO

We propose that the epistemic functions of replication in science are best understood by relating them to kinds of experimental error/uncertainty. One kind of replication, which we call "direct replications," principally serves to assess the reliability of an experiment through its precision: the presence and degree of random error/statistical uncertainty. The other kind of replication, which we call "conceptual replications," principally serves to assess the validity of an experiment through its accuracy: the presence and degree of systematic errors/uncertainties. To illustrate the aptness of this general view, we examine the Hubble constant controversy in astronomy, showing how astronomers have responded to the concordances and discordances in their results by carrying out the different kinds of replication that we identify, with the aim of establishing a precise, accurate value for the Hubble constant. We contrast our view with Machery's "re-sampling" account of replication, which maintains that replications only assess reliability.


Assuntos
Astronomia , Reprodutibilidade dos Testes , Astronomia/história , Incerteza , Projetos de Pesquisa
14.
Am J Hum Genet ; 106(1): 3-12, 2020 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-31866045

RESUMO

In biobank data analysis, most binary phenotypes have unbalanced case-control ratios, and this can cause inflation of type I error rates. Recently, a saddle point approximation (SPA) based single-variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple-variant tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT- and SKAT-O- type region-based tests; in these tests, the single-variant score statistic is calibrated based on SPA and efficient resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p values. In contrast, when the case-control ratio is 1:99, the unadjusted approach has greatly inflated type I error rates (90 times that of exome-wide sequencing α = 2.5 × 10-6). Additionally, the proposed method has similar computation time to the unadjusted approaches and is scalable for large sample data. In our application, the UK Biobank whole-exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare-variant associations with p value < 10-7, including the associations between JAK2 and myeloproliferative disease, HOXB13 and cancer of prostate, and F11 and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server, and this availability can help facilitate the identification of the genetic basis of complex diseases.


Assuntos
Bancos de Espécimes Biológicos , Sequenciamento do Exoma/métodos , Exoma/genética , Estudo de Associação Genômica Ampla , Fenômica , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Simulação por Computador , Humanos , Análise Numérica Assistida por Computador , Fenótipo , Reino Unido
15.
Biostatistics ; 23(4): 1074-1082, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-34718422

RESUMO

There is a great need for statistical methods for analyzing skewed responses in complex sample surveys. Quantile regression is a logical option in addressing this problem but is often accompanied by incorrect variance estimation. We show how the variance can be estimated correctly by including the survey design in the variance estimation process. In a simulation study, we illustrate that the variance of the median regression estimator has a very small relative bias with appropriate coverage probability. The motivation for our work stems from the National Health and Nutrition Examination Survey where we demonstrate the impact of our results on iodine deficiency in females compared with males adjusting for other covariates.


Assuntos
Iodo , Viés , Simulação por Computador , Feminino , Humanos , Masculino , Inquéritos Nutricionais , Inquéritos e Questionários
16.
J Magn Reson Imaging ; 58(2): 403-414, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-36448664

RESUMO

BACKGROUND: In magnetic resonance elastography (MRE), the precision of the observed mechanical depends on the ratio between mechanical wavelength and spatial resolution. Since the mechanical wavelength may vary with actuation frequency, between patients and depending on position, a unique spatial resolution may not always generate an optimal ratio for multifrequency acquisitions, in patients with varying degrees of disease or in mechanically heterogeneous organs. PURPOSE: To describe an MRE reconstruction algorithm that adjusts the ratio between shear wavelength and pixel size, by locally resampling the matrix of shear displacement, and to assess its performance relative to existing reconstructions in different use cases. STUDY TYPE: Prospective. POPULATION: Four phantoms, 20 healthy volunteers (5 men, median age 34, range 20-56) and 46 patients with nonalcoholic fatty liver disease (37 men, median age 63, range 33-83). FIELD STRENGTH/SEQUENCE: A 3 T; gradient-echo elastography sequence with 40 Hz, 60 Hz, and 80 Hz frequencies. ASSESSMENT: For each algorithm, phantoms stiffness were compared against their nominal values, repeatability was calculated in healthy volunteers, and diagnostic performance in detecting advanced liver fibrosis was assessed in 46 patients. STATISTICAL TESTS: Linear regression was used to evaluate the agreement between stiffness values and phantoms stiffnesses. Bland-Altman method was used to evaluate repeatability in volunteers. The ability to diagnose advanced fibrosis was assessed by receiver operating curve analysis (with Youden index thresholds). Significance was considered at P value of 0.05. RESULTS: From the linear regression, the slope closest to 1 is provided by MARS (40 Hz) and k-MDEV (60H, 80 Hz). Repeatability index was best with MDEV (23%) and lowest with k-MDEV (53%). The best performance in detecting advanced fibrosis was provided by MARS at 40 Hz (area under the operating curve, AUC = 0.88), k-MDEV and MARS at 60 Hz (AUC = 0.91), and multimodel direct inversion (MMDI) and MARS at 80 Hz (AUC = 0.90). DATA CONCLUSION: MARS shows the best diagnostic performance to detect advanced fibrosis and the second-best results in phantoms after k-MDEV. EVIDENCE LEVEL: 1. TECHNICAL EFFICACY: Stage 2.


Assuntos
Técnicas de Imagem por Elasticidade , Hepatopatia Gordurosa não Alcoólica , Masculino , Humanos , Adulto , Pessoa de Meia-Idade , Técnicas de Imagem por Elasticidade/métodos , Estudos Prospectivos , Cirrose Hepática/diagnóstico por imagem , Algoritmos , Fígado/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Reprodutibilidade dos Testes
17.
Biometrics ; 79(4): 3549-3563, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37382567

RESUMO

Quantile regression has emerged as a useful and effective tool in modeling survival data, especially for cases where noises demonstrate heterogeneity. Despite recent advancements, non-smooth components involved in censored quantile regression estimators may often yield numerically unstable results, which, in turn, lead to potentially self-contradicting conclusions. We propose an estimating equation-based approach to obtain consistent estimators of the regression coefficients of interest via the induced smoothing technique to circumvent the difficulty. Our proposed estimator can be shown to be asymptotically equivalent to its original unsmoothed version, whose consistency and asymptotic normality can be readily established. Extensions to handle functional covariate data and recurrent event data are also discussed. To alleviate the heavy computational burden of bootstrap-based variance estimation, we also propose an efficient resampling procedure that reduces the computational time considerably. Our numerical studies demonstrate that our proposed estimator provides substantially smoother model parameter estimates across different quantile levels and can achieve better statistical efficiency compared to a plain estimator under various finite-sample settings. The proposed method is also illustrated via four survival datasets, including the HMO (health maintenance organizations) HIV (human immunodeficiency virus) data, the primary biliary cirrhosis (PBC) data, and so forth.


Assuntos
HIV , Modelos Estatísticos , Humanos , Simulação por Computador
18.
BMC Med Res Methodol ; 23(1): 209, 2023 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-37726680

RESUMO

Random Forests are a powerful and frequently applied Machine Learning tool. The permutation variable importance (VIMP) has been proposed to improve the explainability of such a pure prediction model. It describes the expected increase in prediction error after randomly permuting a variable and disturbing its association with the outcome. However, VIMPs measure a variable's marginal influence only, that can make its interpretation difficult or even misleading. In the present work we address the general need for improving the explainability of prediction models by exploring VIMPs in the presence of correlated variables. In particular, we propose to use a variable's residual information for investigating if its permutation importance partially or totally originates from correlated predictors. Hypotheses tests are derived by a resampling algorithm that can further support results by providing test decisions and p-values. In simulation studies we show that the proposed test controls type I error rates. When applying the methods to a Random Forest analysis of post-transplant survival after kidney transplantation, the importance of kidney donor quality for predicting post-transplant survival is shown to be high. However, the transplant allocation policy introduces correlations with other well-known predictors, which raises the concern that the importance of kidney donor quality may simply originate from these predictors. By using the proposed method, this concern is addressed and it is demonstrated that kidney donor quality plays an important role in post-transplant survival, regardless of correlations with other predictors.


Assuntos
Transplante de Rim , Algoritmo Florestas Aleatórias , Humanos , Algoritmos , Simulação por Computador , Aprendizado de Máquina
19.
J Med Internet Res ; 25: e43734, 2023 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-36749620

RESUMO

BACKGROUND: Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. OBJECTIVE: We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning-based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. METHODS: This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling-edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. RESULTS: The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. CONCLUSIONS: Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support.


Assuntos
Amiodarona , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Estudos Retrospectivos , Glândula Tireoide , Hospitais Universitários , Aprendizado de Máquina
20.
Sensors (Basel) ; 23(3)2023 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-36772754

RESUMO

Trend analysis is one of the most powerful techniques for monitoring the technical condition of individual mechanical components of rotating machinery. It is based on extraction of characteristic signal components according to kinetostatic configuration of the machine drivetrain. It has been used for decades and is well-understood. However, classical trend analysis is based on some assumptions which have resulted from the limited computational power of embedded systems years ago. This paper tries to answer a question on whether the assumption of a single signal resampling path for calculation of signal components generated by shafts with rational transmission ratio is valid. The study was conducted using an extensive imbalance test on a medium-power test rig. The paper originally demonstrates that application of an advanced resampling algorithm does not significantly influence the overall trend increase, but it is of utmost importance when trend variance is of interest.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA