Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 106
Filtrar
1.
BMC Bioinformatics ; 25(1): 99, 2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38448819

RESUMEN

BACKGROUND: Cancer, a disease with high morbidity and mortality rates, poses a significant threat to human health. Driver genes, which harbor mutations accountable for the initiation and progression of tumors, play a crucial role in cancer development. Identifying driver genes stands as a paramount objective in cancer research and precision medicine. RESULTS: In the present work, we propose a method for identifying driver genes using a Generalized Linear Regression Model (GLM) with Shrinkage and double-Weighted strategies based on Functional Impact, which is named GSW-FI. Firstly, an estimating model is proposed for assessing the background functional impacts of genes based on GLM, utilizing gene features as predictors. Secondly, the shrinkage and double-weighted strategies as two revising approaches are integrated to ensure the rationality of the identified driver genes. Lastly, a statistical method of hypothesis testing is designed to identify driver genes by leveraging the estimated background function impacts. Experimental results conducted on 31 The Cancer Genome Altas datasets demonstrate that GSW-FI outperforms ten other prediction methods in terms of the overlap fraction with well-known databases and consensus predictions among different methods. CONCLUSIONS: GSW-FI presents a novel approach that efficiently identifies driver genes with functional impact mutations using computational methods, thereby advancing the development of precision medicine for cancer.


Asunto(s)
Neoplasias , Oncogenes , Humanos , Mutación , Cognición , Consenso , Bases de Datos Factuales , Neoplasias/genética
2.
Crit Rev Environ Sci Technol ; 53(7): 827-846, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37138645

RESUMEN

The concept of the exposome encompasses the totality of exposures from a variety of external and internal sources across an individual's life course. The wealth of existing spatial and contextual data makes it appealing to characterize individuals' external exposome to advance our understanding of environmental determinants of health. However, the spatial and contextual exposome is very different from other exposome factors measured at the individual-level as spatial and contextual exposome data are more heterogenous with unique correlation structures and various spatiotemporal scales. These distinctive characteristics lead to multiple unique methodological challenges across different stages of a study. This article provides a review of the existing resources, methods, and tools in the new and developing field for spatial and contextual exposome-health studies focusing on four areas: (1) data engineering, (2) spatiotemporal data linkage, (3) statistical methods for exposome-health association studies, and (4) machine- and deep-learning methods to use spatial and contextual exposome data for disease prediction. A critical analysis of the methodological challenges involved in each of these areas is performed to identify knowledge gaps and address future research needs.

3.
Graefes Arch Clin Exp Ophthalmol ; 260(9): 2877-2885, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35895106

RESUMEN

PURPOSE: To assess the accuracy of the Kane formula for intraocular lens (IOL) power calculation in the pediatric population. METHODS: The charts of pediatric patients who underwent cataract surgery with in-the-bag IOL implantation with one of two IOL models (SA60AT or MA60AC) between 2012 and 2018 in The Hospital for Sick Children, Toronto, Ontario, CanFada, were retrospectively reviewed. The accuracy of IOL power calculation with the Kane formula was evaluated in comparison with the Barrett Universal II (BUII), Haigis, Hoffer Q, Holladay 1, and Sanders-Retzlaff-Kraff Theoretical (SRK/T) formulas. RESULTS: Sixty-two eyes of 62 patients aged 6.2 (IQR 3.2-9.2) years were included. The SD values of the prediction error obtained by Kane (1.38) were comparable with those by BUII (1.34), Hoffer Q (1.37), SRK/T (1.40), Holaday 1 (1.41), and Haigis (1.50), all p > 0.05. A significant difference was observed between the Hoffer Q and Haigis formulas (p = 0.039). No differences in the median and mean absolute errors were found between the Kane formula (0.54 D and 0.91 ± 1.04 D) and BUII (0.50 D and 0.88 ± 1.00 D), Hoffer Q (0.48 D and 0.88 ± 1.05 D), SRK/T (0.72 D and 0.97 ± 1.00 D), Holladay 1 (0.63 D and 0.94 ± 1.05 D), and Haigis (0.57 D and 0.98 ± 1.13 D), p = 0.099. CONCLUSION: This is the first study to investigate the Kane formula in pediatric cataract surgery. Our results place the Kane among the noteworthy IOL power calculation formulas in this age group, offering an additional means for improving IOL calculation in pediatric cataract surgery. The heteroscedastic statistical method was first implemented to evaluate formulas' predictability in children.


Asunto(s)
Catarata , Lentes Intraoculares , Facoemulsificación , Biometría , Niño , Humanos , Óptica y Fotónica , Refracción Ocular , Estudios Retrospectivos
4.
Biotechnol Lett ; 44(10): 1217-1230, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36057882

RESUMEN

Ergosterol as a primary metabolite and precursor of vitamin D2, is the most plentiful mycosterols in fungal cell membrane. Process optimization to increase the yield and productivity of biological products is a topic of interest. Ultrasonic waves have many applications in biotechnology, like cell disruption, and enhancement of primary and secondary metabolites production. This study disclosed an optimal condition for ultrasound-assisted production (UAP) of ergosterol from Penicillium brevicompactum MUCL 19,011 using L9 Taguchi statistical method. The intensity (IS), time of sonication (TS), treatment frequency (TF), and number of days of treatment (DT) were allocated to study the effects of ultrasound on ergosterol production. The results were analyzed using Minitab version 19. The maximum ergosterol, 11 mg/g cell dry weight (CDW), was produced on the tenth day while all factors were at a low level. The days of treatment with a contribution of 45.48% was the most significant factor for ergosterol production. For the first time, this study revealed the positive effect of ultrasound on the production of ergosterol. Ergosterol production increased 73% (4.63 mg/g CDW) after process optimization. Finally, a mathematical model of ultrasound factors with a regression coefficient of R2 = 0.978 was obtained for the ergosterol production during ultrasound treatment.


Asunto(s)
Productos Biológicos , Penicillium , Productos Biológicos/metabolismo , Ergocalciferoles/metabolismo , Ergosterol/metabolismo , Penicillium/genética , Penicillium/metabolismo
5.
Acta Biochim Biophys Sin (Shanghai) ; 54(6): 864-873, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35713313

RESUMEN

High-throughput sequencing for B cell receptor (BCR) repertoire provides useful insights for the adaptive immune system. With the continuous development of the BCR-seq technology, many efforts have been made to develop methods for analyzing the ever-increasing BCR repertoire data. In this review, we comprehensively outline different BCR repertoire library preparation protocols and summarize three major steps of BCR-seq data analysis, i. e., V(D)J sequence annotation, clonal phylogenetic inference, and BCR repertoire profiling and mining. Different from other reviews in this field, we emphasize background intuition and the statistical principle of each method to help biologists better understand it. Finally, we discuss data mining problems for BCR-seq data and with a highlight on recently emerging multiple-sample analysis.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Receptores de Antígenos de Linfocitos B , Células Cultivadas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Filogenia , Receptores de Antígenos de Linfocitos B/genética
6.
Int Arch Occup Environ Health ; 94(7): 1537-1547, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-33847787

RESUMEN

OBJECTIVE: To present a sensitivity analysis of the most widely used means of estimating lifetime occupational exposure proportion (LOEP) and their respective impacts on LOEP and population-attributable fraction (PAF) estimates. METHODS: A French population-based sample with full job history (N = 10,010) was linked with four Matgéné job-exposure matrices: flour, cement, silica and benzene. LOEP and the 95% confidence interval were estimated using four methods: the maximum exposure probability during the career (Proba_max), two methods subdividing careers into job-periods (job-period_M1, job-period_M2) and one into job-years (job-year). To quantify differences between methods, percentages of variation were calculated for proportion values and PAF, and compared with published results for France using cross-sectional proportion multiplied by a factor. RESULTS: For each agent, LOEP estimated from the maximum probability during the career (Proba_max) was consistently lower than proportion taking account of job-periods or job-years. LOEP on Proba_max for flour, cement, silica and benzene were, respectively, 4.4% 95% CI (4.0-4.7), 4.3% (3.9-4.6), 6.1% (5.7-6.5) and 3.9% (3.6-4.2). Percentage of variation ranged from 0 to 55.8% according to the agent. The number of cancer cases varied by a twofold factor for exposure to silica and lung cancer and by a fourfold factor for exposure to benzene and acute myeloid lymphoma. CONCLUSIONS: The present study provides a description of several LOEP estimation methods based on exposure assessment over the entire career and describes their impact on PAF. For health monitoring purposes, we recommend to report a range of LOEP with low and high estimates obtained using job-periods (job-period_M1 and job-period_M2).


Asunto(s)
Leucemia Mieloide Aguda/epidemiología , Neoplasias Pulmonares/epidemiología , Exposición Profesional/análisis , Adulto , Anciano , Benceno , Materiales de Construcción , Femenino , Harina , Francia/epidemiología , Humanos , Masculino , Persona de Mediana Edad , Exposición Profesional/estadística & datos numéricos , Dióxido de Silicio
7.
BMC Bioinformatics ; 20(1): 12, 2019 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-30616521

RESUMEN

BACKGROUND: Reverse engineering approaches to infer gene regulatory networks using computational methods are of great importance to annotate gene functionality and identify hub genes. Although various statistical algorithms have been proposed, development of computational tools to integrate results from different methods and user-friendly online tools is still lagging. RESULTS: We developed a web server that efficiently constructs gene networks from expression data. It allows the user to use ten different network construction methods (such as partial correlation-, likelihood-, Bayesian- and mutual information-based methods) and integrates the resulting networks from multiple methods. Hub gene information, if available, can be incorporated to enhance performance. CONCLUSIONS: GeNeCK is an efficient and easy-to-use web application for gene regulatory network construction. It can be accessed at http://lce.biohpc.swmed.edu/geneck .


Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Redes Reguladoras de Genes , Internet , Mapas de Interacción de Proteínas , Programas Informáticos , Algoritmos , Teorema de Bayes , Humanos , Transcriptoma
8.
J Nutr ; 149(9): 1667-1673, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-31172188

RESUMEN

BACKGROUND: To estimate usual intake distributions of dietary components, collection of nonconsecutive repeated 24-h dietary recalls is recommended, but resource limitations sometimes restrict data collection to single-day dietary data per person. OBJECTIVES: We developed a new statistical method, the NCI 1-d method, which uses single-day dietary data and an external within-person to between-person variance ratio to estimate population distributions of usual intake of nearly-daily consumed foods and nutrients. METHODS: We used NHANES 2011-2014 data for men (n = 4938 and n = 4293 for the first and second 24-h recalls) to compare nutrient intake distributions of vitamin A, magnesium, folate, and vitamin E generated by the 1-d method (with use of only the first recall per person) with those from the NCI amount-only method (with use of all days of dietary intake per person). The within-person to between-person variance ratio from the amount-only model was used as the unbiased "external" estimate for the 1-d method. We also examined the effect of mis-specification of variance ratios on usual intake distributions. RESULTS: The amount-only and 1-d methods estimated statistically equivalent median (25p, 75p): 647 (459, 890) compared with 648 (461, 886) µg retinol activity equivalents/d, 338 (268, 420) compared with 334 (266, 417) mg magnesium/d, 595 (458, 762) compared with 589 (456, 758) µg dietary folate equivalents/d, and 9.7 (7.3, 12.6) compared with 9.6 (7.3, 12.7) mg vitamin E/d. As the external variance ratios increased from 25% to 200% of the unbiased ratios, the prevalence of inadequate intake ranged from 53% to 43% for vitamin A, 57% to 55% for magnesium, 16% to 2% for folate, and 70% to 73% for vitamin E. CONCLUSIONS: The 1-d method is a viable statistical method for estimating usual intakes of nearly-daily consumed dietary components when the variance ratio is unbiased. Results are sensitive to variance ratio selection, so researchers should still collect replicate data where possible.


Asunto(s)
Dieta , Ingestión de Energía , Nutrientes/administración & dosificación , Ácido Fólico/administración & dosificación , Humanos , Magnesio/administración & dosificación , Estadística como Asunto , Vitamina A/administración & dosificación , Vitamina E/administración & dosificación
9.
Public Health Nutr ; 22(4): 757-763, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30253818

RESUMEN

OBJECTIVE: Unequal obesity distributions among adult populations have been reported in low- and middle-income countries, but mainly based on data of women of reproductive age. Moreover, incorporation of ever-changing skewed BMI distributions in analyses has been a challenge. Our study aimed to assess magnitude and rates of change in BMI distributions by age and sex. DESIGN: Shapes of BMI distributions were estimated for 2005 and 2010, and their changes were assessed, using the generalized additive model for location, scale and shape (GAMLSS) and assuming BMI follows a Box-Cox power exponential (BCPE) distribution. SETTING: Nationally representative, repeated cross-sectional health surveys conducted between 2005 and 2013 in Mexico, Colombia and Peru. SUBJECTS: Adult men and non-pregnant women aged 20-69 years. RESULTS: Whereas women had more right-shifted and wider BMI distributions than men in almost all age groups across the countries in 2010, men in their 30s-40s experienced more rapid increases in BMI between 2005 and 2010, notably in Peru. The highest increase in overweight and obesity prevalence was observed among Peruvian men of 35-39 years, with a 5-year increase of 21 percentage points. CONCLUSIONS: The BCPE-GAMLSS method is an alternative to analyse measurements with time-varying distributions visually, in addition to conventional indicators such as means and prevalences. Consideration of differences in BMI distributions and their changes by sex and age would provide vital information in tailoring relevant policies and programmes to reach target populations effectively. Increases in BMI portend increases of obesity-associated diseases, for which preventive and preparative actions are urgent.


Asunto(s)
Índice de Masa Corporal , Obesidad/epidemiología , Adulto , Distribución por Edad , Anciano , Colombia/epidemiología , Estudios Transversales , Femenino , Encuestas Epidemiológicas , Humanos , Masculino , México/epidemiología , Persona de Mediana Edad , Sobrepeso/epidemiología , Perú/epidemiología , Distribución por Sexo , Factores de Tiempo , Adulto Joven
10.
Molecules ; 24(5)2019 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-30845684

RESUMEN

The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou's pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.


Asunto(s)
Biología Computacional , Proteínas/química , Proteómica/métodos , Secuencia de Aminoácidos , Aminoácidos/química , Línea Celular , Bases de Datos de Proteínas , Dipéptidos/química , Máquina de Vectores de Soporte
11.
AAPS PharmSciTech ; 20(7): 268, 2019 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-31350676

RESUMEN

Chemoinformatics is emerging as a new trend to set drug discovery which correlates the relationship between structure and biological functions. The main aim of chemoinformatics refers to analyzing the similarity among molecules, searching the molecules in the structural database, finding potential drug molecule and their property. One of the key fields in chemoinformatics is quantitative structure-property relationship (QSPR), which is an alternative process to predict the various physicochemical and biopharmaceutical properties. This methodology expresses molecules via various numerical values or properties (descriptors), which encodes the structural characteristics of molecules and further used to calculate physicochemical properties of the molecule. The established QSPR model could be used to predict the properties of compounds that have been measured or even have been unknown, which ultimately accelerates the development process of a new molecule or the product. The formulation characteristics (drug release, transportability, bioavailability) can be predicted with the integration of QSPR approach. Therefore, QSPR modeling is an emerging trend to skip conventional drug as well as formulation development process. The current review highlights the overall process involved in the application of the QSPR approach in formulation development.


Asunto(s)
Composición de Medicamentos , Descubrimiento de Drogas , Liberación de Fármacos , Relación Estructura-Actividad Cuantitativa
12.
Zhongguo Zhong Yao Za Zhi ; 44(8): 1724-1728, 2019 Apr.
Artículo en Zh | MEDLINE | ID: mdl-31090341

RESUMEN

This study aims to explore the evaluation model for the proficiency testing of heavy metal and harmful element residues in pharmaceuticals,and to provide reference for the proficiency testing program and proficiency testing result in the field of residue analysis. The proficiency test result of cadmium determination in honeysuckle as an example. The algorithm A,NIQR,and Horwitz function are used to calculate the assigned value and the standard deviation. Z was obtained at the same time. If | Z | ≤2,the result is satisfactory. If 2< | Z | <3,the result is questionable. If | Z | ≥3,the result is unsatisfactory. In addition,the median value is the assigned value,and deviation(D%) is used. If D% is not more than 16%,the result is satisfactory; if D% is more than 16%,the result is unsatisfactory. After analysis,in the results of questionable or dissatisfied laboratories calculated by algorithm A and NIQR,the deviation error of some data is within the scope of the standard. In the results of the satisfactory laboratory evaluated by the Horwitz function,some data deviation errors far exceed the standard range. The evaluation result of the D% meets the requirements. According to heavy metal and harmful element trace analysis methods,this study is the first to apply D% to the evaluation of the detection ability of heavy metals and harmful elements in pharmaceuticals. This method makes the evaluation result more reasonable,and has important reference significance for the evaluation of other proficiency test results.


Asunto(s)
Cadmio/análisis , Ensayos de Aptitud de Laboratorios , Preparaciones Farmacéuticas/normas , Oligoelementos/análisis , Laboratorios , Lonicera/química , Preparaciones de Plantas/normas
13.
Genet Epidemiol ; 41(7): 587-598, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28726280

RESUMEN

Increasing evidence has shown that genes may cause prenatal, neonatal, and pediatric diseases depending on their parental origins. Statistical models that incorporate parent-of-origin effects (POEs) can improve the power of detecting disease-associated genes and help explain the missing heritability of diseases. In many studies, children have been sequenced for genome-wide association testing. But it may become unaffordable to sequence their parents and evaluate POEs. Motivated by the reality, we proposed a budget-friendly study design of sequencing children and only genotyping their parents through single nucleotide polymorphism array. We developed a powerful likelihood-based method, which takes into account both sequence reads and linkage disequilibrium to infer the parental origins of children's alleles and estimate their POEs on the outcome. We evaluated the performance of our proposed method and compared it with an existing method using only genotypes, through extensive simulations. Our method showed higher power than the genotype-based method. When either the mean read depth or the pair-end length was reasonably large, our method achieved ideal power. When single parents' genotypes were unavailable or parental genotypes at the testing locus were not typed, both methods lost power compared with when complete data were available; but the power loss from our method was smaller than the genotype-based method. We also extended our method to accommodate mixed genotype, low-, and high-coverage sequence data from children and their parents. At presence of sequence errors, low-coverage parental sequence data may lead to lower power than parental genotype data.


Asunto(s)
Análisis Mutacional de ADN/métodos , Predisposición Genética a la Enfermedad , Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Proyectos de Investigación , Alelos , Niño , Simulación por Computador , Análisis Mutacional de ADN/economía , Femenino , Estudio de Asociación del Genoma Completo/economía , Humanos , Funciones de Verosimilitud , Desequilibrio de Ligamiento , Masculino , Núcleo Familiar , Linaje
14.
Emerg Infect Dis ; 24(3): 573-575, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29460749

RESUMEN

We previously reported use of genotype surveillance data to predict outbreaks among incident tuberculosis clusters. We propose a method to detect possible outbreaks among endemic tuberculosis clusters. We detected 15 possible outbreaks, of which 10 had epidemiologic data or whole-genome sequencing results. Eight outbreaks were corroborated.


Asunto(s)
Brotes de Enfermedades , Modelos Estadísticos , Mycobacterium tuberculosis , Tuberculosis/epidemiología , Análisis por Conglomerados , Genoma Bacteriano , Genómica/métodos , Genotipo , Humanos , Incidencia , Epidemiología Molecular , Mycobacterium tuberculosis/genética , Polimorfismo de Nucleótido Simple , Prevalencia , Tuberculosis/diagnóstico , Tuberculosis/microbiología , Estados Unidos
15.
Molecules ; 23(12)2018 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-30486378

RESUMEN

Coptis plants (Ranunculaceae) to have played an important role in the prevention and treatment human diseases in Chinese history. In this study, a multi-level strategy based on metabolic and molecular genetic methods was performed for the characterization of four Coptis herbs (C. chinensis, C. deltoidea, C. omeiensis and C. teeta) using high performance liquid chromatography-ultraviolet (HPLC-UV) and restriction site-associated DNA sequencing (RAD-seq) techniques. Protoberberine alkaloids including berberine, palmatine, coptisine, epiberberine, columbamine, jatrorrhizine, magnoflorine and groenlandicine in rhizomes were identified and determined based on the HPLC-UV method. Among them, berberine was demonstrated as the most abundant compound in these plants. RAD-seq was applied to discover single nucleotide polymorphisms (SNPs) data. A total of 44,747,016 reads were generated and 2,443,407 SNPs were identified in regarding to four plants. Additionally, with respect to complicated metabolic and SNP data, multivariable statistical methods of principal component analysis (PCA) and hierarchical cluster analysis (HCA) were successively applied to interpret the structure characteristics. The metabolic variation and genetic relationship among different Coptis plants were successfully illustrated based on data visualization. Summarily, this comprehensive strategy has been proven as a reliable and effective approach to characterize Coptis plants, which can provide additional information for their quality assessment.


Asunto(s)
Alcaloides de Berberina/análisis , Coptis , Medicamentos Herbarios Chinos/análisis , Metabolómica , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Cromatografía Líquida de Alta Presión , Coptis/química , Coptis/genética , Humanos
16.
Entropy (Basel) ; 20(11)2018 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-33266608

RESUMEN

The main purpose of the present study is to apply three classification models, namely, the index of entropy (IOE) model, the logistic regression (LR) model, and the support vector machine (SVM) model by radial basis function (RBF), to produce landslide susceptibility maps for the Fugu County of Shaanxi Province, China. Firstly, landslide locations were extracted from field investigation and aerial photographs, and a total of 194 landslide polygons were transformed into points to produce a landslide inventory map. Secondly, the landslide points were randomly split into two groups (70/30) for training and validation purposes, respectively. Then, 10 landslide explanatory variables, such as slope aspect, slope angle, altitude, lithology, mean annual precipitation, distance to roads, distance to rivers, distance to faults, land use, and normalized difference vegetation index (NDVI), were selected and the potential multicollinearity problems between these factors were detected by the Pearson Correlation Coefficient (PCC), the variance inflation factor (VIF), and tolerance (TOL). Subsequently, the landslide susceptibility maps for the study region were obtained using the IOE model, the LR-IOE, and the SVM-IOE model. Finally, the performance of these three models was verified and compared using the receiver operating characteristics (ROC) curve. The success rate results showed that the LR-IOE model has the highest accuracy (90.11%), followed by the IOE model (87.43%) and the SVM-IOE model (86.53%). Similarly, the AUC values also showed that the prediction accuracy expresses a similar result, with the LR-IOE model having the highest accuracy (81.84%), followed by the IOE model (76.86%) and the SVM-IOE model (76.61%). Thus, the landslide susceptibility map (LSM) for the study region can provide an effective reference for the Fugu County government to properly address land planning and mitigate landslide risk.

17.
J Stat Comput Simul ; 89(2): 249-271, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30962669

RESUMEN

There are no practical and effective mechanisms to share high-dimensional data including sensitive information in various fields like health financial intelligence or socioeconomics without compromising either the utility of the data or exposing private personal or secure organizational information. Excessive scrambling or encoding of the information makes it less useful for modelling or analytical processing. Insufficient preprocessing may compromise sensitive information and introduce a substantial risk for re-identification of individuals by various stratification techniques. To address this problem, we developed a novel statistical obfuscation method (DataSifter) for on-the-fly de-identification of structured and unstructured sensitive high-dimensional data such as clinical data from electronic health records (EHR). DataSifter provides complete administrative control over the balance between risk of data re-identification and preservation of the data information. Simulation results suggest that DataSifter can provide privacy protection while maintaining data utility for different types of outcomes of interest. The application of DataSifter on a large autism dataset provides a realistic demonstration of its promise practical applications.

18.
BMC Bioinformatics ; 18(1): 45, 2017 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-28103803

RESUMEN

BACKGROUND: The detection of rare single nucleotide variants (SNVs) is important for understanding genetic heterogeneity using next-generation sequencing (NGS) data. Various computational algorithms have been proposed to detect variants at the single nucleotide level in mixed samples. Yet, the noise inherent in the biological processes involved in NGS technology necessitates the development of statistically accurate methods to identify true rare variants. RESULTS: We propose a Bayesian statistical model and a variational expectation maximization (EM) algorithm to estimate non-reference allele frequency (NRAF) and identify SNVs in heterogeneous cell populations. We demonstrate that our variational EM algorithm has comparable sensitivity and specificity compared with a Markov Chain Monte Carlo (MCMC) sampling inference algorithm, and is more computationally efficient on tests of relatively low coverage (27× and 298×) data. Furthermore, we show that our model with a variational EM inference algorithm has higher specificity than many state-of-the-art algorithms. In an analysis of a directed evolution longitudinal yeast data set, we are able to identify a time-series trend in non-reference allele frequency and detect novel variants that have not yet been reported. Our model also detects the emergence of a beneficial variant earlier than was previously shown, and a pair of concomitant variants. CONCLUSIONS: We developed a variational EM algorithm for a hierarchical Bayesian model to identify rare variants in heterogeneous next-generation sequencing data. Our algorithm is able to identify variants in a broad range of read depths and non-reference allele frequencies with high sensitivity and specificity.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Modelos Teóricos , Análisis de Secuencia de ADN , Algoritmos , Teorema de Bayes , Frecuencia de los Genes , Cadenas de Markov , Método de Montecarlo
19.
Genet Epidemiol ; 40(8): 678-688, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-27390122

RESUMEN

The identification of gene-gene and gene-environment interaction in human traits and diseases is an active area of research that generates high expectation, and most often lead to high disappointment. This is partly explained by a misunderstanding of the inherent characteristics of standard regression-based interaction analyses. Here, I revisit and untangle major theoretical aspects of interaction tests in the special case of linear regression; in particular, I discuss variables coding scheme, interpretation of effect estimate, statistical power, and estimation of variance explained in regard of various hypothetical interaction patterns. Linking this components it appears first that the simplest biological interaction models-in which the magnitude of a genetic effect depends on a common exposure-are among the most difficult to identify. Second, I highlight the demerit of the current strategy to evaluate the contribution of interaction effects to the variance of quantitative outcomes and argue for the use of new approaches to overcome this issue. Finally, I explore the advantages and limitations of multivariate interaction models, when testing for interaction between multiple SNPs and/or multiple exposures, over univariate approaches. Together, these new insights can be leveraged for future method development and to improve our understanding of the genetic architecture of multifactorial traits.


Asunto(s)
Interacción Gen-Ambiente , Estudios de Asociación Genética/métodos , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Índice de Masa Corporal , Ejercicio Físico , Humanos , Modelos Lineales , Fenotipo , Análisis de Regresión
20.
J Am Soc Nephrol ; 27(11): 3253-3265, 2016 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-27486138

RESUMEN

Mendelian randomization refers to an analytic approach to assess the causality of an observed association between a modifiable exposure or risk factor and a clinically relevant outcome. It presents a valuable tool, especially when randomized controlled trials to examine causality are not feasible and observational studies provide biased associations because of confounding or reverse causality. These issues are addressed by using genetic variants as instrumental variables for the tested exposure: the alleles of this exposure-associated genetic variant are randomly allocated and not subject to reverse causation. This, together with the wide availability of published genetic associations to screen for suitable genetic instrumental variables make Mendelian randomization a time- and cost-efficient approach and contribute to its increasing popularity for assessing and screening for potentially causal associations. An observed association between the genetic instrumental variable and the outcome supports the hypothesis that the exposure in question is causally related to the outcome. This review provides an overview of the Mendelian randomization method, addresses assumptions and implications, and includes illustrative examples. We also discuss special issues in nephrology, such as inverse risk factor associations in advanced disease, and outline opportunities to design Mendelian randomization studies around kidney function and disease.


Asunto(s)
Causalidad , Análisis de la Aleatorización Mendeliana , Enfermedades Cardiovasculares/genética , Humanos , Estudios Observacionales como Asunto , Insuficiencia Renal Crónica/genética
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda