Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 109(4): 680-691, 2022 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-35298919

RESUMO

Identification of rare-variant associations is crucial to full characterization of the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirror the distribution of rare variants and haplotype structure in real data. Additionally, importing real-variant annotation enables in silico comparison of methods, such as rare-variant association tests and polygenic scoring methods, that focus on putative causal variants. Existing simulation methods are either unable to employ real-variant annotation or severely under- or overestimate the number of singletons and doubletons, thereby reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare-variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real-variant annotations. We highlight RAREsim's utility across various genetic regions, sample sizes, ancestries, and variant classes.


Assuntos
Variação Genética , Projetos de Pesquisa , Simulação por Computador , Variação Genética/genética , Haplótipos/genética , Humanos , Modelos Genéticos , Herança Multifatorial
2.
Sensors (Basel) ; 22(14)2022 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-35890832

RESUMO

When classifying objects in 3D LiDAR data, it is important to use efficient collection methods and processing algorithms. This paper considers the resolution needed to classify 3D objects accurately and discusses how this resolution is accomplished for the RedTail RTL-450 LiDAR System. We employ VoxNet, a convolutional neural network, to classify the 3D data and test the accuracy using different data resolution levels. The results show that for our data set, if the neural network is trained using higher resolution data, then the accuracy of the classification is above 97%, even for the very sparse testing set (10% of original test data set point density). When the training is done on lower resolution data sets, the classification accuracy remains good but drops off at around 3% of the original test data set point density. These results have implications for determining flight altitude and speed for an unmanned aerial vehicle (UAV) to achieve high accuracy classification. The findings point to the value of high-resolution point clouds for both the training of the convolutional neural network and in data collected from a LiDAR sensor.

3.
Sensors (Basel) ; 22(14)2022 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-35891090

RESUMO

The accurate recognition of activities is fundamental for following up on the health progress of people with dementia (PwD), thereby supporting subsequent diagnosis and treatments. When monitoring the activities of daily living (ADLs), it is feasible to detect behaviour patterns, parse out the disease evolution, and consequently provide effective and timely assistance. However, this task is affected by uncertainties derived from the differences in smart home configurations and the way in which each person undertakes the ADLs. One adjacent pathway is to train a supervised classification algorithm using large-sized datasets; nonetheless, obtaining real-world data is costly and characterized by a challenging recruiting research process. The resulting activity data is then small and may not capture each person's intrinsic properties. Simulation approaches have risen as an alternative efficient choice, but synthetic data can be significantly dissimilar compared to real data. Hence, this paper proposes the application of Partial Least Squares Regression (PLSR) to approximate the real activity duration of various ADLs based on synthetic observations. First, the real activity duration of each ADL is initially contrasted with the one derived from an intelligent environment simulator. Following this, different PLSR models were evaluated for estimating real activity duration based on synthetic variables. A case study including eight ADLs was considered to validate the proposed approach. The results revealed that simulated and real observations are significantly different in some ADLs (p-value < 0.05), nevertheless synthetic variables can be further modified to predict the real activity duration with high accuracy (R2(pred)>90%).


Assuntos
Atividades Cotidianas , Demência , Algoritmos , Demência/diagnóstico , Humanos , Análise dos Mínimos Quadrados
4.
BMC Bioinformatics ; 22(1): 266, 2021 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-34034652

RESUMO

BACKGROUND: Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short. RESULTS: Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control. CONCLUSIONS: Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Isoformas de Proteínas/genética , RNA-Seq , Análise de Sequência de RNA
5.
Sociol Methods Res ; 50(4): 1725-1762, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34621095

RESUMO

Although agent-based models (ABMs) have been increasingly accepted in social sciences as a valid tool to formalize theory, propose mechanisms able to recreate regularities, and guide empirical research, we are not aware of any research using ABMs to assess the robustness of our statistical methods. We argue that ABMs can be extremely helpful to assess models when the phenomena under study are complex. As an example, we create an ABM to evaluate the estimation of selection and influence effects by SIENA, a stochastic actor-oriented model proposed by Tom A. B. Snijders and colleagues. It is a prominent network analysis method that has gained popularity during the last 10 years and been applied to estimate selection and influence for a broad range of behaviors and traits such as substance use, delinquency, violence, health, and educational attainment. However, we know little about the conditions for which this method is reliable or the particular biases it might have. The results from our analysis show that selection and influence are estimated by SIENA asymmetrically and that, with very simple assumptions, we can generate data where selection estimates are highly sensitive to misspecification, suggesting caution when interpreting SIENA analyses.

6.
Entropy (Basel) ; 23(9)2021 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-34573765

RESUMO

In this article, we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge, we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study, the objects were smartphone photographs of near-complete Roman terra sigillata pottery vessels from the collection of the Museum of London. Taking the replicated features from published profile drawings of pottery forms allowed the integration of expert knowledge into the process through our synthetic data generator. After this first initial training the model was fine-tuned with data from photographs of real vessels. We show, through exhaustive experiments across several popular deep learning architectures, different test priors, and considering the impact of the photograph viewpoint and excessive damage to the vessels, that the proposed hybrid approach enables the creation of classifiers with appropriate generalisation performance. This performance is significantly better than that of classifiers trained exclusively on the original data, which shows the promise of the approach to alleviate the fundamental issue of learning from small datasets.

7.
Molecules ; 26(1)2020 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-33374492

RESUMO

Real-time reverse transcription (RT) PCR is the gold standard for detecting Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), owing to its sensitivity and specificity, thereby meeting the demand for the rising number of cases. The scarcity of trained molecular biologists for analyzing PCR results makes data verification a challenge. Artificial intelligence (AI) was designed to ease verification, by detecting atypical profiles in PCR curves caused by contamination or artifacts. Four classes of simulated real-time RT-PCR curves were generated, namely, positive, early, no, and abnormal amplifications. Machine learning (ML) models were generated and tested using small amounts of data from each class. The best model was used for classifying the big data obtained by the Virology Laboratory of Simon Bolivar University from real-time RT-PCR curves for SARS-CoV-2, and the model was retrained and implemented in a software that correlated patient data with test and AI diagnoses. The best strategy for AI included a binary classification model, which was generated from simulated data, where data analyzed by the first model were classified as either positive or negative and abnormal. To differentiate between negative and abnormal, the data were reevaluated using the second model. In the first model, the data required preanalysis through a combination of prepossessing. The early amplification class was eliminated from the models because the numbers of cases in big data was negligible. ML models can be created from simulated data using minimum available information. During analysis, changes or variations can be incorporated by generating simulated data, avoiding the incorporation of large amounts of experimental data encompassing all possible changes. For diagnosing SARS-CoV-2, this type of AI is critical for optimizing PCR tests because it enables rapid diagnosis and reduces false positives. Our method can also be used for other types of molecular analyses.


Assuntos
Inteligência Artificial , Teste para COVID-19/métodos , COVID-19/virologia , Modelos Biológicos , Reação em Cadeia da Polimerase em Tempo Real/métodos , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , SARS-CoV-2/isolamento & purificação , Big Data , Humanos , Reprodutibilidade dos Testes , SARS-CoV-2/genética
8.
Neuroimage ; 200: 511-527, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-31247300

RESUMO

Despite motion artifacts are a major source of noise in fNIRS infant data, how to approach motion correction in this population has only recently started to be investigated. Homer2 offers a wide range of motion correction methods and previous work on simulated and adult data suggested the use of Spline interpolation and Wavelet filtering as optimal methods for the recovery of trials affected by motion. However, motion artifacts in infant data differ from those in adults' both in amplitude and frequency of occurrence. Therefore, artifact correction recommendations derived from adult data might not be optimal for infant data. We hypothesized that the combined use of Spline and Wavelet would outperform their individual use on data with complex profiles of motion artifacts. To demonstrate this, we first compared, on infant semi-simulated data, the performance of several motion correction techniques on their own and of the novel combined approach; then, we investigated the performance of Spline and Wavelet alone and in combination on real cognitive data from three datasets collected with infants of different ages (5, 7 and 10 months), with different tasks (auditory, visual and tactile) and with different NIRS systems. To quantitatively estimate and compare the efficacy of these techniques, we adopted four metrics: hemodynamic response recovery error, within-subject standard deviation, between-subjects standard deviation and number of trials that survived each correction method. Our results demonstrated that (i) it is always better correcting for motion artifacts than rejecting the corrupted trials; (ii) Wavelet filtering on its own and in combination with Spline interpolation seems to be the most effective approach in reducing the between- and the within-subject standard deviations. Importantly, the combination of Spline and Wavelet was the approach providing the best performance in semi-simulation both at low and high levels of noise, also recovering most of the trials affected by motion artifacts across all datasets, a crucial result when working with infant data.


Assuntos
Artefatos , Córtex Cerebral/fisiologia , Neuroimagem Funcional/normas , Processamento de Imagem Assistida por Computador/normas , Espectroscopia de Luz Próxima ao Infravermelho/normas , Córtex Cerebral/diagnóstico por imagem , Feminino , Movimentos da Cabeça/fisiologia , Humanos , Lactente , Masculino
9.
Sensors (Basel) ; 19(12)2019 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-31207884

RESUMO

This paper addresses the problem of interferometric noise reduction in Synthetic Aperture Radar (SAR) interferometry based on sparse and redundant representations over a trained dictionary. The idea is to use a Proximity-based K-SVD (ProK-SVD) algorithm on interferometric data for obtaining a suitable dictionary, in order to extract the phase image content effectively. We implemented this strategy on both simulated as well as real interferometric data for the validation of our approach. For synthetic data, three different training dictionaries have been compared, namely, a dictionary extracted from the data, a dictionary obtained by a uniform random distribution in [ - π , π ] , and a dictionary built from discrete cosine transform. Further, a similar strategy plan has been applied to real interferograms. We used interferometric data of various SAR sensors, including low resolution C-band ERS/ENVISAT, medium L-band ALOS, and high resolution X-band COSMO-SkyMed, all over an area of Mt. Etna, Italy. Both on simulated and real interferometric phase images, the proposed approach shows significant noise reduction within the fringe pattern, without any considerable loss of useful information.

10.
Behav Res Methods ; 49(5): 1824-1837, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28039681

RESUMO

This paper discusses power and sample-size computation for likelihood ratio and Wald testing of the significance of covariate effects in latent class models. For both tests, asymptotic distributions can be used; that is, the test statistic can be assumed to follow a central Chi-square under the null hypothesis and a non-central Chi-square under the alternative hypothesis. Power or sample-size computation using these asymptotic distributions requires specification of the non-centrality parameter, which in practice is rarely known. We show how to calculate this non-centrality parameter using a large simulated data set from the model under the alternative hypothesis. A simulation study is conducted evaluating the adequacy of the proposed power analysis methods, determining the key study design factor affecting the power level, and comparing the performance of the likelihood ratio and Wald test. The proposed power analysis methods turn out to perform very well for a broad range of conditions. Moreover, apart from effect size and sample size, an important factor affecting the power is the class separation, implying that when class separation is low, rather large sample sizes are needed to achieve a reasonable power level.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa/estatística & dados numéricos , Tamanho da Amostra , Humanos , Funções Verossimilhança
11.
Glob Chang Biol ; 22(8): 2651-64, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26872305

RESUMO

Increasing biodiversity loss due to climate change is one of the most vital challenges of the 21st century. To anticipate and mitigate biodiversity loss, models are needed that reliably project species' range dynamics and extinction risks. Recently, several new approaches to model range dynamics have been developed to supplement correlative species distribution models (SDMs), but applications clearly lag behind model development. Indeed, no comparative analysis has been performed to evaluate their performance. Here, we build on process-based, simulated data for benchmarking five range (dynamic) models of varying complexity including classical SDMs, SDMs coupled with simple dispersal or more complex population dynamic models (SDM hybrids), and a hierarchical Bayesian process-based dynamic range model (DRM). We specifically test the effects of demographic and community processes on model predictive performance. Under current climate, DRMs performed best, although only marginally. Under climate change, predictive performance varied considerably, with no clear winners. Yet, all range dynamic models improved predictions under climate change substantially compared to purely correlative SDMs, and the population dynamic models also predicted reasonable extinction risks for most scenarios. When benchmarking data were simulated with more complex demographic and community processes, simple SDM hybrids including only dispersal often proved most reliable. Finally, we found that structural decisions during model building can have great impact on model accuracy, but prior system knowledge on important processes can reduce these uncertainties considerably. Our results reassure the clear merit in using dynamic approaches for modelling species' response to climate change but also emphasize several needs for further model and data improvement. We propose and discuss perspectives for improving range projections through combination of multiple models and for making these approaches operational for large numbers of species.


Assuntos
Benchmarking , Mudança Climática , Ecossistema , Teorema de Bayes , Clima , Modelos Biológicos , Dinâmica Populacional
12.
Neuroimage ; 109: 341-56, 2015 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-25555998

RESUMO

Advances in diffusion-weighted magnetic resonance imaging (DW-MRI) have led to many alternative diffusion sampling strategies and analysis methodologies. A common objective among methods is estimation of white matter fiber orientations within each voxel, as doing so permits in-vivo fiber-tracking and the ability to study brain connectivity and networks. Knowledge of how DW-MRI sampling schemes affect fiber estimation accuracy, tractography and the ability to recover complex white-matter pathways, differences between results due to choice of analysis method, and which method(s) perform optimally for specific data sets, all remain important problems, especially as tractography-based studies become common. In this work, we begin to address these concerns by developing sets of simulated diffusion-weighted brain images which we then use to quantitatively evaluate the performance of six DW-MRI analysis methods in terms of estimated fiber orientation accuracy, false-positive (spurious) and false-negative (missing) fiber rates, and fiber-tracking. The analysis methods studied are: 1) a two-compartment "ball and stick" model (BSM) (Behrens et al., 2003); 2) a non-negativity constrained spherical deconvolution (CSD) approach (Tournier et al., 2007); 3) analytical q-ball imaging (QBI) (Descoteaux et al., 2007); 4) q-ball imaging with Funk-Radon and Cosine Transform (FRACT) (Haldar and Leahy, 2013); 5) q-ball imaging within constant solid angle (CSA) (Aganj et al., 2010); and 6) a generalized Fourier transform approach known as generalized q-sampling imaging (GQI) (Yeh et al., 2010). We investigate these methods using 20, 30, 40, 60, 90 and 120 evenly distributed q-space samples of a single shell, and focus on a signal-to-noise ratio (SNR = 18) and diffusion-weighting (b = 1000 s/mm(2)) common to clinical studies. We found that the BSM and CSD methods consistently yielded the least fiber orientation error and simultaneously greatest detection rate of fibers. Fiber detection rate was found to be the most distinguishing characteristic between the methods, and a significant factor for complete recovery of tractography through complex white-matter pathways. For example, while all methods recovered similar tractography of prominent white matter pathways of limited fiber crossing, CSD (which had the highest fiber detection rate, especially for voxels containing three fibers) recovered the greatest number of fibers and largest fraction of correct tractography for complex three-fiber crossing regions. The synthetic data sets, ground-truth, and tools for quantitative evaluation are publically available on the NITRC website as the project "Simulated DW-MRI Brain Data Sets for Quantitative Evaluation of Estimated Fiber Orientations" at http://www.nitrc.org/projects/sim_dwi_brain.


Assuntos
Encéfalo/anatomia & histologia , Imagem de Difusão por Ressonância Magnética/métodos , Imagem de Tensor de Difusão/métodos , Substância Branca/anatomia & histologia , Adulto , Algoritmos , Simulação por Computador , Humanos , Processamento de Imagem Assistida por Computador , Masculino
13.
Diagn Progn Res ; 7(1): 7, 2023 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-37069621

RESUMO

BACKGROUND: The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model. METHODS: We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted. RESULTS: The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model. CONCLUSIONS: For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model.

14.
Genes (Basel) ; 13(12)2022 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-36553629

RESUMO

The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest.


Assuntos
Algoritmos , Software , RNA-Seq , Análise de Sequência de RNA/métodos , Simulação por Computador
15.
J Med Imaging (Bellingham) ; 9(4): 045501, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35818569

RESUMO

Purpose: The most frequently used model for simulating multireader multicase (MRMC) data that emulates confidence-of-disease ratings from diagnostic imaging studies has been the Roe and Metz (RM) model, proposed by Roe and Metz in 1997 and later generalized by Hillis (2012), Abbey et al. (2013), and Gallas and Hillis (2014). A problem with these models is that it has been difficult to set model parameters such that the simulated data are similar to MRMC data encountered in practice. To remedy this situation, Hillis (2018) mapped parameters from the RM model to Obuchowski-Rockette (OR) model parameters that describe the distribution of the empirical AUC outcomes computed from the RM model simulated data. We continue that work by providing the reverse mapping, i.e., by deriving an algorithm that expresses RM parameters as functions of the OR empirical AUC distribution parameters. Approach: We solve for the corresponding RM parameters in terms of the OR parameters using numerical methods. Results: An algorithm is developed that results in, at most, one solution of RM parameter values that correspond to inputted OR parameter values. The algorithm can be implemented using an R software function. Examples are provided that illustrate the use of the algorithm. A simulation study validates the algorithm. Conclusions: The resulting algorithm makes it possible to easily determine RM model parameter values such that simulated data emulate a specific real-data study. Thus, MRMC analysis methods can be empirically tested using simulated data similar to that encountered in practice.

16.
J Neurosci Methods ; 371: 109501, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35182604

RESUMO

BACKGROUND: The Harvard Automatic Processing Pipeline for Electroencephalography (HAPPE) is a computerized EEG data processing pipeline designed for multiple site analysis of populations with neurodevelopmental disorders. This pipeline has been validated in-house by the developers but external testing using real-world datasets remains to be done. NEW METHOD: Resting and auditory event-related EEG data from 29 children ages 3-6 years with Fragile X Syndrome as well as simulated EEG data was used to evaluate HAPPE's noise reduction techniques, data standardization features, and data integration compared to traditional manualized processing. RESULTS: For the real EEG data, HAPPE pipeline showed greater trials retained, greater variance retained through independent component analysis (ICA) component removal, and smaller kurtosis than the manual pipeline; the manual pipeline had a significantly larger signal-to-noise ratio (SNR). For simulated EEG data, correlation between the pure signal and processed data was significantly higher for manually-processed data compared to HAPPE-processed data. Hierarchical linear modeling showed greater signal recovery in the manual pipeline with the exception of the gamma band signal which showed mixed results. COMPARISON WITH EXISTING METHODS: SNR and simulated signal retention was significantly greater in the manually-processed data than the HAPPE-processed data. Signal reduction may negatively affect outcome measures. CONCLUSIONS: The HAPPE pipeline benefits from less active processing time and artifact reduction without removing segments. However, HAPPE may bias toward elimination of noise at the cost of signal. Recommended implementation of the HAPPE pipeline for neurodevelopmental populations depends on the goals and priorities of the research.


Assuntos
Síndrome do Cromossomo X Frágil , Algoritmos , Artefatos , Criança , Pré-Escolar , Eletroencefalografia/métodos , Humanos , Processamento de Sinais Assistido por Computador , Razão Sinal-Ruído
17.
Mol Ecol Resour ; 21(8): 2689-2705, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33745225

RESUMO

Population genetics relies heavily on simulated data for validation, inference and intuition. In particular, since the evolutionary 'ground truth' for real data is always limited, simulated data are crucial for training supervised machine learning methods. Simulation software can accurately model evolutionary processes but requires many hand-selected input parameters. As a result, simulated data often fail to mirror the properties of real genetic data, which limits the scope of methods that rely on it. Here, we develop a novel approach to estimating parameters in population genetic models that automatically adapts to data from any population. Our method, pg-gan, is based on a generative adversarial network that gradually learns to generate realistic synthetic data. We demonstrate that our method is able to recover input parameters in a simulated isolation-with-migration model. We then apply our method to human data from the 1000 Genomes Project and show that we can accurately recapitulate the features of real data.


Assuntos
Software , Simulação por Computador , Demografia , Humanos
18.
Nutrients ; 13(10)2021 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-34684473

RESUMO

The aim of this study was to unravel the methodological challenges when exploring nutritional inadequacy, involving 608 healthy pregnant women. The usual intake of twenty-one nutrients was recorded by employing a validated FFQ. Simulated datasets of usual intake were generated, with randomly imposed uncertainty. The comparison between the usual intake and the EAR was accomplished with the probability approach and the EAR cut-point method. Point estimates were accompanied by bootstrap confidence intervals. Bootstrap intervals applied on the risk of inadequacy for raw and simulated data tended in most cases to overlap. A detailed statistical analysis, aiming to predict the level of inadequacy, as well as the application of the EAR cut-point method, along with bootstrap intervals, could effectively be used to assess nutrient inadequacy. However, the final decision for the method used depends on the distribution of nutrient-intake under evaluation. Irrespective of the applied methodology, moderate to high levels of inadequacy, calculated from FFQ were identified for certain nutrients (e.g., vitamins C, B6, magnesium, vitamin A), while the highest were recorded for folate and iron. Considering that micronutrient-poor, obesogenic diets are becoming more common, the underlying rationale may help towards unraveling the complexity characterizing nutritional inadequacies, especially in vulnerable populations.


Assuntos
Necessidades Nutricionais , Adulto , Registros de Dieta , Ingestão de Alimentos , Ingestão de Energia , Feminino , Humanos , Estilo de Vida , Micronutrientes , Modelos Teóricos , Gravidez , Recomendações Nutricionais
19.
Ecol Evol ; 10(20): 11699-11712, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33144994

RESUMO

Meta-analyses often encounter studies with incompletely reported variance measures (e.g., standard deviation values) or sample sizes, both needed to conduct weighted meta-analyses. Here, we first present a systematic literature survey on the frequency and treatment of missing data in published ecological meta-analyses showing that the majority of meta-analyses encountered incompletely reported studies. We then simulated meta-analysis data sets to investigate the performance of 14 options to treat or impute missing SDs and/or SSs. Performance was thereby assessed using results from fully informed weighted analyses on (hypothetically) complete data sets. We show that the omission of incompletely reported studies is not a viable solution. Unweighted and sample size-based variance approximation can yield unbiased grand means if effect sizes are independent of their corresponding SDs and SSs. The performance of different imputation methods depends on the structure of the meta-analysis data set, especially in the case of correlated effect sizes and standard deviations or sample sizes. In a best-case scenario, which assumes that SDs and/or SSs are both missing at random and are unrelated to effect sizes, our simulations show that the imputation of up to 90% of missing data still yields grand means and confidence intervals that are similar to those obtained with fully informed weighted analyses. We conclude that multiple imputation of missing variance measures and sample sizes could help overcome the problem of incompletely reported primary studies, not only in the field of ecological meta-analyses. Still, caution must be exercised in consideration of potential correlations and pattern of missingness.

20.
Appl Spectrosc ; 74(4): 427-438, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31961223

RESUMO

Preprocessing of Raman spectra is generally done in three separate steps: (1) cosmic ray removal, (2) signal smoothing, and (3) baseline subtraction. We show that a convolutional neural network (CNN) can be trained using simulated data to handle all steps in one operation. First, synthetic spectra are created by randomly adding peaks, baseline, mixing of peaks and baseline with background noise, and cosmic rays. Second, a CNN is trained on synthetic spectra and known peaks. The results from preprocessing were generally of higher quality than what was achieved using a reference based on standardized methods (second-difference, asymmetric least squares, cross-validation). From 105 simulated observations, 91.4% predictions had smaller absolute error (RMSE), 90.3% had improved quality (SSIM), and 94.5% had reduced signal-to-noise (SNR) power. The CNN preprocessing generated reliable results on measured Raman spectra from polyethylene, paraffin and ethanol with background contamination from polystyrene. The result shows a promising proof of concept for the automated preprocessing of Raman spectra.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA