Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Diagn Progn Res ; 7(1): 7, 2023 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-37069621

RESUMO

BACKGROUND: The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model. METHODS: We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted. RESULTS: The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model. CONCLUSIONS: For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model.

2.
Genes (Basel) ; 13(12)2022 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-36553629

RESUMO

The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest.


Assuntos
Algoritmos , Software , RNA-Seq , Análise de Sequência de RNA/métodos , Simulação por Computador
3.
Sensors (Basel) ; 22(14)2022 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-35890832

RESUMO

When classifying objects in 3D LiDAR data, it is important to use efficient collection methods and processing algorithms. This paper considers the resolution needed to classify 3D objects accurately and discusses how this resolution is accomplished for the RedTail RTL-450 LiDAR System. We employ VoxNet, a convolutional neural network, to classify the 3D data and test the accuracy using different data resolution levels. The results show that for our data set, if the neural network is trained using higher resolution data, then the accuracy of the classification is above 97%, even for the very sparse testing set (10% of original test data set point density). When the training is done on lower resolution data sets, the classification accuracy remains good but drops off at around 3% of the original test data set point density. These results have implications for determining flight altitude and speed for an unmanned aerial vehicle (UAV) to achieve high accuracy classification. The findings point to the value of high-resolution point clouds for both the training of the convolutional neural network and in data collected from a LiDAR sensor.

4.
Sensors (Basel) ; 22(14)2022 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-35891090

RESUMO

The accurate recognition of activities is fundamental for following up on the health progress of people with dementia (PwD), thereby supporting subsequent diagnosis and treatments. When monitoring the activities of daily living (ADLs), it is feasible to detect behaviour patterns, parse out the disease evolution, and consequently provide effective and timely assistance. However, this task is affected by uncertainties derived from the differences in smart home configurations and the way in which each person undertakes the ADLs. One adjacent pathway is to train a supervised classification algorithm using large-sized datasets; nonetheless, obtaining real-world data is costly and characterized by a challenging recruiting research process. The resulting activity data is then small and may not capture each person's intrinsic properties. Simulation approaches have risen as an alternative efficient choice, but synthetic data can be significantly dissimilar compared to real data. Hence, this paper proposes the application of Partial Least Squares Regression (PLSR) to approximate the real activity duration of various ADLs based on synthetic observations. First, the real activity duration of each ADL is initially contrasted with the one derived from an intelligent environment simulator. Following this, different PLSR models were evaluated for estimating real activity duration based on synthetic variables. A case study including eight ADLs was considered to validate the proposed approach. The results revealed that simulated and real observations are significantly different in some ADLs (p-value < 0.05), nevertheless synthetic variables can be further modified to predict the real activity duration with high accuracy (R2(pred)>90%).


Assuntos
Atividades Cotidianas , Demência , Algoritmos , Demência/diagnóstico , Humanos , Análise dos Mínimos Quadrados
5.
J Med Imaging (Bellingham) ; 9(4): 045501, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35818569

RESUMO

Purpose: The most frequently used model for simulating multireader multicase (MRMC) data that emulates confidence-of-disease ratings from diagnostic imaging studies has been the Roe and Metz (RM) model, proposed by Roe and Metz in 1997 and later generalized by Hillis (2012), Abbey et al. (2013), and Gallas and Hillis (2014). A problem with these models is that it has been difficult to set model parameters such that the simulated data are similar to MRMC data encountered in practice. To remedy this situation, Hillis (2018) mapped parameters from the RM model to Obuchowski-Rockette (OR) model parameters that describe the distribution of the empirical AUC outcomes computed from the RM model simulated data. We continue that work by providing the reverse mapping, i.e., by deriving an algorithm that expresses RM parameters as functions of the OR empirical AUC distribution parameters. Approach: We solve for the corresponding RM parameters in terms of the OR parameters using numerical methods. Results: An algorithm is developed that results in, at most, one solution of RM parameter values that correspond to inputted OR parameter values. The algorithm can be implemented using an R software function. Examples are provided that illustrate the use of the algorithm. A simulation study validates the algorithm. Conclusions: The resulting algorithm makes it possible to easily determine RM model parameter values such that simulated data emulate a specific real-data study. Thus, MRMC analysis methods can be empirically tested using simulated data similar to that encountered in practice.

6.
Am J Hum Genet ; 109(4): 680-691, 2022 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-35298919

RESUMO

Identification of rare-variant associations is crucial to full characterization of the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirror the distribution of rare variants and haplotype structure in real data. Additionally, importing real-variant annotation enables in silico comparison of methods, such as rare-variant association tests and polygenic scoring methods, that focus on putative causal variants. Existing simulation methods are either unable to employ real-variant annotation or severely under- or overestimate the number of singletons and doubletons, thereby reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare-variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real-variant annotations. We highlight RAREsim's utility across various genetic regions, sample sizes, ancestries, and variant classes.


Assuntos
Variação Genética , Projetos de Pesquisa , Simulação por Computador , Variação Genética/genética , Haplótipos/genética , Humanos , Modelos Genéticos , Herança Multifatorial
7.
J Neurosci Methods ; 371: 109501, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35182604

RESUMO

BACKGROUND: The Harvard Automatic Processing Pipeline for Electroencephalography (HAPPE) is a computerized EEG data processing pipeline designed for multiple site analysis of populations with neurodevelopmental disorders. This pipeline has been validated in-house by the developers but external testing using real-world datasets remains to be done. NEW METHOD: Resting and auditory event-related EEG data from 29 children ages 3-6 years with Fragile X Syndrome as well as simulated EEG data was used to evaluate HAPPE's noise reduction techniques, data standardization features, and data integration compared to traditional manualized processing. RESULTS: For the real EEG data, HAPPE pipeline showed greater trials retained, greater variance retained through independent component analysis (ICA) component removal, and smaller kurtosis than the manual pipeline; the manual pipeline had a significantly larger signal-to-noise ratio (SNR). For simulated EEG data, correlation between the pure signal and processed data was significantly higher for manually-processed data compared to HAPPE-processed data. Hierarchical linear modeling showed greater signal recovery in the manual pipeline with the exception of the gamma band signal which showed mixed results. COMPARISON WITH EXISTING METHODS: SNR and simulated signal retention was significantly greater in the manually-processed data than the HAPPE-processed data. Signal reduction may negatively affect outcome measures. CONCLUSIONS: The HAPPE pipeline benefits from less active processing time and artifact reduction without removing segments. However, HAPPE may bias toward elimination of noise at the cost of signal. Recommended implementation of the HAPPE pipeline for neurodevelopmental populations depends on the goals and priorities of the research.


Assuntos
Síndrome do Cromossomo X Frágil , Algoritmos , Artefatos , Criança , Pré-Escolar , Eletroencefalografia/métodos , Humanos , Processamento de Sinais Assistido por Computador , Razão Sinal-Ruído
8.
Sociol Methods Res ; 50(4): 1725-1762, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34621095

RESUMO

Although agent-based models (ABMs) have been increasingly accepted in social sciences as a valid tool to formalize theory, propose mechanisms able to recreate regularities, and guide empirical research, we are not aware of any research using ABMs to assess the robustness of our statistical methods. We argue that ABMs can be extremely helpful to assess models when the phenomena under study are complex. As an example, we create an ABM to evaluate the estimation of selection and influence effects by SIENA, a stochastic actor-oriented model proposed by Tom A. B. Snijders and colleagues. It is a prominent network analysis method that has gained popularity during the last 10 years and been applied to estimate selection and influence for a broad range of behaviors and traits such as substance use, delinquency, violence, health, and educational attainment. However, we know little about the conditions for which this method is reliable or the particular biases it might have. The results from our analysis show that selection and influence are estimated by SIENA asymmetrically and that, with very simple assumptions, we can generate data where selection estimates are highly sensitive to misspecification, suggesting caution when interpreting SIENA analyses.

9.
Nutrients ; 13(10)2021 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-34684473

RESUMO

The aim of this study was to unravel the methodological challenges when exploring nutritional inadequacy, involving 608 healthy pregnant women. The usual intake of twenty-one nutrients was recorded by employing a validated FFQ. Simulated datasets of usual intake were generated, with randomly imposed uncertainty. The comparison between the usual intake and the EAR was accomplished with the probability approach and the EAR cut-point method. Point estimates were accompanied by bootstrap confidence intervals. Bootstrap intervals applied on the risk of inadequacy for raw and simulated data tended in most cases to overlap. A detailed statistical analysis, aiming to predict the level of inadequacy, as well as the application of the EAR cut-point method, along with bootstrap intervals, could effectively be used to assess nutrient inadequacy. However, the final decision for the method used depends on the distribution of nutrient-intake under evaluation. Irrespective of the applied methodology, moderate to high levels of inadequacy, calculated from FFQ were identified for certain nutrients (e.g., vitamins C, B6, magnesium, vitamin A), while the highest were recorded for folate and iron. Considering that micronutrient-poor, obesogenic diets are becoming more common, the underlying rationale may help towards unraveling the complexity characterizing nutritional inadequacies, especially in vulnerable populations.


Assuntos
Necessidades Nutricionais , Adulto , Registros de Dieta , Ingestão de Alimentos , Ingestão de Energia , Feminino , Humanos , Estilo de Vida , Micronutrientes , Modelos Teóricos , Gravidez , Recomendações Nutricionais
10.
Entropy (Basel) ; 23(9)2021 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-34573765

RESUMO

In this article, we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge, we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study, the objects were smartphone photographs of near-complete Roman terra sigillata pottery vessels from the collection of the Museum of London. Taking the replicated features from published profile drawings of pottery forms allowed the integration of expert knowledge into the process through our synthetic data generator. After this first initial training the model was fine-tuned with data from photographs of real vessels. We show, through exhaustive experiments across several popular deep learning architectures, different test priors, and considering the impact of the photograph viewpoint and excessive damage to the vessels, that the proposed hybrid approach enables the creation of classifiers with appropriate generalisation performance. This performance is significantly better than that of classifiers trained exclusively on the original data, which shows the promise of the approach to alleviate the fundamental issue of learning from small datasets.

11.
BMC Bioinformatics ; 22(1): 266, 2021 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-34034652

RESUMO

BACKGROUND: Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short. RESULTS: Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control. CONCLUSIONS: Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Isoformas de Proteínas/genética , RNA-Seq , Análise de Sequência de RNA
12.
Mol Ecol Resour ; 21(8): 2689-2705, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33745225

RESUMO

Population genetics relies heavily on simulated data for validation, inference and intuition. In particular, since the evolutionary 'ground truth' for real data is always limited, simulated data are crucial for training supervised machine learning methods. Simulation software can accurately model evolutionary processes but requires many hand-selected input parameters. As a result, simulated data often fail to mirror the properties of real genetic data, which limits the scope of methods that rely on it. Here, we develop a novel approach to estimating parameters in population genetic models that automatically adapts to data from any population. Our method, pg-gan, is based on a generative adversarial network that gradually learns to generate realistic synthetic data. We demonstrate that our method is able to recover input parameters in a simulated isolation-with-migration model. We then apply our method to human data from the 1000 Genomes Project and show that we can accurately recapitulate the features of real data.


Assuntos
Software , Simulação por Computador , Demografia , Humanos
13.
Molecules ; 26(1)2020 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-33374492

RESUMO

Real-time reverse transcription (RT) PCR is the gold standard for detecting Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), owing to its sensitivity and specificity, thereby meeting the demand for the rising number of cases. The scarcity of trained molecular biologists for analyzing PCR results makes data verification a challenge. Artificial intelligence (AI) was designed to ease verification, by detecting atypical profiles in PCR curves caused by contamination or artifacts. Four classes of simulated real-time RT-PCR curves were generated, namely, positive, early, no, and abnormal amplifications. Machine learning (ML) models were generated and tested using small amounts of data from each class. The best model was used for classifying the big data obtained by the Virology Laboratory of Simon Bolivar University from real-time RT-PCR curves for SARS-CoV-2, and the model was retrained and implemented in a software that correlated patient data with test and AI diagnoses. The best strategy for AI included a binary classification model, which was generated from simulated data, where data analyzed by the first model were classified as either positive or negative and abnormal. To differentiate between negative and abnormal, the data were reevaluated using the second model. In the first model, the data required preanalysis through a combination of prepossessing. The early amplification class was eliminated from the models because the numbers of cases in big data was negligible. ML models can be created from simulated data using minimum available information. During analysis, changes or variations can be incorporated by generating simulated data, avoiding the incorporation of large amounts of experimental data encompassing all possible changes. For diagnosing SARS-CoV-2, this type of AI is critical for optimizing PCR tests because it enables rapid diagnosis and reduces false positives. Our method can also be used for other types of molecular analyses.


Assuntos
Inteligência Artificial , Teste para COVID-19/métodos , COVID-19/virologia , Modelos Biológicos , Reação em Cadeia da Polimerase em Tempo Real/métodos , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , SARS-CoV-2/isolamento & purificação , Big Data , Humanos , Reprodutibilidade dos Testes , SARS-CoV-2/genética
14.
Ecol Evol ; 10(20): 11699-11712, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33144994

RESUMO

Meta-analyses often encounter studies with incompletely reported variance measures (e.g., standard deviation values) or sample sizes, both needed to conduct weighted meta-analyses. Here, we first present a systematic literature survey on the frequency and treatment of missing data in published ecological meta-analyses showing that the majority of meta-analyses encountered incompletely reported studies. We then simulated meta-analysis data sets to investigate the performance of 14 options to treat or impute missing SDs and/or SSs. Performance was thereby assessed using results from fully informed weighted analyses on (hypothetically) complete data sets. We show that the omission of incompletely reported studies is not a viable solution. Unweighted and sample size-based variance approximation can yield unbiased grand means if effect sizes are independent of their corresponding SDs and SSs. The performance of different imputation methods depends on the structure of the meta-analysis data set, especially in the case of correlated effect sizes and standard deviations or sample sizes. In a best-case scenario, which assumes that SDs and/or SSs are both missing at random and are unrelated to effect sizes, our simulations show that the imputation of up to 90% of missing data still yields grand means and confidence intervals that are similar to those obtained with fully informed weighted analyses. We conclude that multiple imputation of missing variance measures and sample sizes could help overcome the problem of incompletely reported primary studies, not only in the field of ecological meta-analyses. Still, caution must be exercised in consideration of potential correlations and pattern of missingness.

15.
PeerJ ; 8: e9382, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32612891

RESUMO

Joint encounter (JE) models estimate demographic rates using live recapture and dead recovery data. The extent to which limited recapture or recovery data can hinder estimation in JE models is not completely understood. Yet limited data are common in ecological research. We designed a series of simulations using Bayesian multistate JE models that spanned a large range of potential recapture probabilities (0.01-0.90) and two reported mortality probabilities (0.10, 0.19). We calculated bias by comparing estimates against known probabilities of survival, fidelity and reported mortality. We explored whether sparse data (i.e., recapture probabilities <0.02) compromised inference about survival by comparing estimates from dead recovery (DR) and JE models using an 18-year data set from a migratory bird, the lesser snow goose (Anser caerulescens caerulescens). Our simulations showed that bias in probabilities of survival, fidelity and reported mortality was relatively low across a large range of recapture probabilities, except when recapture and reported mortality probabilities were both lowest. While bias in fidelity probability was similar across all recapture probabilities, the root mean square error declined substantially with increased recapture probabilities for reported mortality probabilities of 0.10 or 0.19, as expected. In our case study, annual survival probabilities for adult female snow geese were similar whether estimated with JE or DR models, but more precise from JE models than those from DR models. Thus, our simulated and empirical data suggest acceptably minimal bias in survival, fidelity or reported mortality probabilities estimated from JE models. Even a small amount of recapture information provided adequate structure for JE models, except when reported mortality probabilities were <0.10. Thus, practitioners with limited recapture data should not be discouraged from use of JE models. We recommend that ecologists incorporate other data types as frequently as analytically possible, since precision of focal parameters is improved, and additional parameters of interest can be estimated.

16.
Appl Spectrosc ; 74(4): 427-438, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31961223

RESUMO

Preprocessing of Raman spectra is generally done in three separate steps: (1) cosmic ray removal, (2) signal smoothing, and (3) baseline subtraction. We show that a convolutional neural network (CNN) can be trained using simulated data to handle all steps in one operation. First, synthetic spectra are created by randomly adding peaks, baseline, mixing of peaks and baseline with background noise, and cosmic rays. Second, a CNN is trained on synthetic spectra and known peaks. The results from preprocessing were generally of higher quality than what was achieved using a reference based on standardized methods (second-difference, asymmetric least squares, cross-validation). From 105 simulated observations, 91.4% predictions had smaller absolute error (RMSE), 90.3% had improved quality (SSIM), and 94.5% had reduced signal-to-noise (SNR) power. The CNN preprocessing generated reliable results on measured Raman spectra from polyethylene, paraffin and ethanol with background contamination from polystyrene. The result shows a promising proof of concept for the automated preprocessing of Raman spectra.

17.
Neuroimage ; 200: 511-527, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-31247300

RESUMO

Despite motion artifacts are a major source of noise in fNIRS infant data, how to approach motion correction in this population has only recently started to be investigated. Homer2 offers a wide range of motion correction methods and previous work on simulated and adult data suggested the use of Spline interpolation and Wavelet filtering as optimal methods for the recovery of trials affected by motion. However, motion artifacts in infant data differ from those in adults' both in amplitude and frequency of occurrence. Therefore, artifact correction recommendations derived from adult data might not be optimal for infant data. We hypothesized that the combined use of Spline and Wavelet would outperform their individual use on data with complex profiles of motion artifacts. To demonstrate this, we first compared, on infant semi-simulated data, the performance of several motion correction techniques on their own and of the novel combined approach; then, we investigated the performance of Spline and Wavelet alone and in combination on real cognitive data from three datasets collected with infants of different ages (5, 7 and 10 months), with different tasks (auditory, visual and tactile) and with different NIRS systems. To quantitatively estimate and compare the efficacy of these techniques, we adopted four metrics: hemodynamic response recovery error, within-subject standard deviation, between-subjects standard deviation and number of trials that survived each correction method. Our results demonstrated that (i) it is always better correcting for motion artifacts than rejecting the corrupted trials; (ii) Wavelet filtering on its own and in combination with Spline interpolation seems to be the most effective approach in reducing the between- and the within-subject standard deviations. Importantly, the combination of Spline and Wavelet was the approach providing the best performance in semi-simulation both at low and high levels of noise, also recovering most of the trials affected by motion artifacts across all datasets, a crucial result when working with infant data.


Assuntos
Artefatos , Córtex Cerebral/fisiologia , Neuroimagem Funcional/normas , Processamento de Imagem Assistida por Computador/normas , Espectroscopia de Luz Próxima ao Infravermelho/normas , Córtex Cerebral/diagnóstico por imagem , Feminino , Movimentos da Cabeça/fisiologia , Humanos , Lactente , Masculino
18.
Sensors (Basel) ; 19(12)2019 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-31207884

RESUMO

This paper addresses the problem of interferometric noise reduction in Synthetic Aperture Radar (SAR) interferometry based on sparse and redundant representations over a trained dictionary. The idea is to use a Proximity-based K-SVD (ProK-SVD) algorithm on interferometric data for obtaining a suitable dictionary, in order to extract the phase image content effectively. We implemented this strategy on both simulated as well as real interferometric data for the validation of our approach. For synthetic data, three different training dictionaries have been compared, namely, a dictionary extracted from the data, a dictionary obtained by a uniform random distribution in [ - π , π ] , and a dictionary built from discrete cosine transform. Further, a similar strategy plan has been applied to real interferograms. We used interferometric data of various SAR sensors, including low resolution C-band ERS/ENVISAT, medium L-band ALOS, and high resolution X-band COSMO-SkyMed, all over an area of Mt. Etna, Italy. Both on simulated and real interferometric phase images, the proposed approach shows significant noise reduction within the fringe pattern, without any considerable loss of useful information.

19.
Mutat Res Rev Mutat Res ; 779: 114-125, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31097148

RESUMO

Copy number variants (CNVs) are intermediate-scale structural variants containing copy number changes involving DNA fragments of between 1 kb and 5 Mb. Although known to account for a significant proportion of the genetic burden in human disease, the role of CNVs (especially small CNVs) is often underestimated, as they are undetectable by traditional Sanger sequencing. Since the development of next-generation sequencing (NGS) technologies, several research groups have compared depth of coverage (DoC) patterns between samples, an approach that may facilitate effective CNV detection. Most CNV detection tools based on DoC comparisons are designed to work with whole-genome sequencing (WGS) or whole-exome sequencing (WES) data. However, few methods developed to date are designed for custom/commercial targeted NGS (tg-NGS) panels, the assays most commonly used for diagnostic purposes. Moreover, the development and evaluation of these tools is hindered by (i) the scarcity of thoroughly annotated data containing CNVs and (ii) a dearth of simulation tools for WES and tg-NGS that mimic the errors and biases encountered in these data. Here, we review DoC-based CNV detection methods described in the current literature, assess their performance with simulated tg-NGS data, and discuss their strengths and weaknesses when integrated into the daily laboratory workflow. Our findings suggest that the best methods for CNV detection in tg-NGS panels are DECoN, ExomeDepth, and ExomeCNV. Regardless of the method used, there is a need to make these programs more user-friendly to enable their use by diagnostic laboratory staff who lack bioinformatics training.


Assuntos
Variações do Número de Cópias de DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos , Exoma/genética , Testes Genéticos/métodos , Humanos , Análise de Sequência de DNA/métodos
20.
Stud Health Technol Inform ; 257: 319-324, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30741217

RESUMO

This paper presents a framework for addressing data access challenges associated with secondary use of high-dimensional transactional datasets that have been extracted from electronic health/medical records (EHRs). These datasets are subject to the data de-identification "curse of dimensionality" [1] which manifests as substantial challenges to preserving analytical integrity of data contents when high-dimensional datasets must be de-identified and deemed free of Personal Information (PI) prior to disclosure. A large array of methods can achieve this objective - for low dimensional datasets. However, these methods have not been scaled up to the types of high-dimensional data that must be sourced from the transactional EHR if the objective is specifically to generate products that can inform point-of-care clinical decision-making. The Applied Clinical Research Unit (ACRU) in Island Health is implementing a process that addresses key privacy challenges inherent in disclosures of high-dimensional transactional health data. This paper presents a schematic and abbreviated rendering of key principles and processes on which the ACRU approach is based.


Assuntos
Anonimização de Dados , Registros Eletrônicos de Saúde , Privacidade , Análise de Dados , Revelação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA