Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 19(1): 343, 2018 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-30268091

RESUMO

BACKGROUND: Targeted amplicon sequencing of the 16S ribosomal RNA gene is one of the key tools for studying microbial diversity. The accuracy of this approach strongly depends on the choice of primer pairs and, in particular, on the balance between efficiency, specificity and sensitivity in the amplification of the different bacterial 16S sequences contained in a sample. There is thus the need for computational methods to design optimal bacterial 16S primers able to take into account the knowledge provided by the new sequencing technologies. RESULTS: We propose here a computational method for optimizing the choice of primer sets, based on multi-objective optimization, which simultaneously: 1) maximizes efficiency and specificity of target amplification; 2) maximizes the number of different bacterial 16S sequences matched by at least one primer; 3) minimizes the differences in the number of primers matching each bacterial 16S sequence. Our algorithm can be applied to any desired amplicon length without affecting computational performance. The source code of the developed algorithm is released as the mopo16S software tool (Multi-Objective Primer Optimization for 16S experiments) under the GNU General Public License and is available at http://sysbiobig.dei.unipd.it/?q=Software#mopo16S . CONCLUSIONS: Results show that our strategy is able to find better primer pairs than the ones available in the literature according to all three optimization criteria. We also experimentally validated three of the primer pairs identified by our method on multiple bacterial species, belonging to different genera and phyla. Results confirm the predicted efficiency and the ability to maximize the number of different bacterial 16S sequences matched by primers.


Assuntos
Bactérias/genética , Primers do DNA/normas , Reação em Cadeia da Polimerase/normas , RNA Bacteriano/genética , RNA Ribossômico 16S/genética , Software , Primers do DNA/genética
2.
Bioinformatics ; 33(8): 1250-1252, 2017 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-28003263

RESUMO

Motivation: A Bayesian Network is a probabilistic graphical model that encodes probabilistic dependencies between a set of random variables. We introduce bnstruct, an open source R package to (i) learn the structure and the parameters of a Bayesian Network from data in the presence of missing values and (ii) perform reasoning and inference on the learned Bayesian Networks. To the best of our knowledge, there is no other open source software that provides methods for all of these tasks, particularly the manipulation of missing data, which is a common situation in practice. Availability and Implementation: The software is implemented in R and C and is available on CRAN under a GPL licence. Contact: francesco.sambo@unipd.it. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Modelos Estatísticos , Software , Teorema de Bayes , Humanos
3.
J Am Soc Nephrol ; 28(2): 557-574, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27647854

RESUMO

Diabetes is the leading cause of ESRD. Despite evidence for a substantial heritability of diabetic kidney disease, efforts to identify genetic susceptibility variants have had limited success. We extended previous efforts in three dimensions, examining a more comprehensive set of genetic variants in larger numbers of subjects with type 1 diabetes characterized for a wider range of cross-sectional diabetic kidney disease phenotypes. In 2843 subjects, we estimated that the heritability of diabetic kidney disease was 35% (P=6.4×10-3). Genome-wide association analysis and replication in 12,540 individuals identified no single variants reaching stringent levels of significance and, despite excellent power, provided little independent confirmation of previously published associated variants. Whole-exome sequencing in 997 subjects failed to identify any large-effect coding alleles of lower frequency influencing the risk of diabetic kidney disease. However, sets of alleles increasing body mass index (P=2.2×10-5) and the risk of type 2 diabetes (P=6.1×10-4) associated with the risk of diabetic kidney disease. We also found genome-wide genetic correlation between diabetic kidney disease and failure at smoking cessation (P=1.1×10-4). Pathway analysis implicated ascorbate and aldarate metabolism (P=9.0×10-6), and pentose and glucuronate interconversions (P=3.0×10-6) in pathogenesis of diabetic kidney disease. These data provide further evidence for the role of genetic factors influencing diabetic kidney disease in those with type 1 diabetes and highlight some key pathways that may be responsible. Altogether these results reveal important biology behind the major cause of kidney disease.


Assuntos
Diabetes Mellitus Tipo 1/complicações , Diabetes Mellitus Tipo 1/genética , Nefropatias Diabéticas/genética , Adolescente , Adulto , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
4.
Bioinformatics ; 30(21): 3078-85, 2014 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-25064564

RESUMO

MOTIVATION: The increasing interest in rare genetic variants and epistatic genetic effects on complex phenotypic traits is currently pushing genome-wide association study design towards datasets of increasing size, both in the number of studied subjects and in the number of genotyped single nucleotide polymorphisms (SNPs). This, in turn, is leading to a compelling need for new methods for compression and fast retrieval of SNP data. RESULTS: We present a novel algorithm and file format for compressing and retrieving SNP data, specifically designed for large-scale association studies. Our algorithm is based on two main ideas: (i) compress linkage disequilibrium blocks in terms of differences with a reference SNP and (ii) compress reference SNPs exploiting information on their call rate and minor allele frequency. Tested on two SNP datasets and compared with several state-of-the-art software tools, our compression algorithm is shown to be competitive in terms of compression rate and to outperform all tools in terms of time to load compressed data. AVAILABILITY AND IMPLEMENTATION: Our compression and decompression algorithms are implemented in a C++ library, are released under the GNU General Public License and are freely downloadable from http://www.dei.unipd.it/~sambofra/snpack.html.


Assuntos
Algoritmos , Compressão de Dados/métodos , Polimorfismo de Nucleotídeo Único , Frequência do Gene , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação , Software
5.
Bioinformatics ; 30(3): 384-91, 2014 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-24292361

RESUMO

MOTIVATION: In the past years, both sequencing and microarray have been widely used to search for relations between genetic variations and predisposition to complex pathologies such as diabetes or neurological disorders. These studies, however, have been able to explain only a small fraction of disease heritability, possibly because complex pathologies cannot be referred to few dysfunctional genes, but are rather heterogeneous and multicausal, as a result of a combination of rare and common variants possibly impairing multiple regulatory pathways. Rare variants, though, are difficult to detect, especially when the effects of causal variants are in different directions, i.e. with protective and detrimental effects. RESULTS: Here, we propose ABACUS, an Algorithm based on a BivAriate CUmulative Statistic to identify single nucleotide polymorphisms (SNPs) significantly associated with a disease within predefined sets of SNPs such as pathways or genomic regions. ABACUS is robust to the concurrent presence of SNPs with protective and detrimental effects and of common and rare variants; moreover, it is powerful even when few SNPs in the SNP-set are associated with the phenotype. We assessed ABACUS performance on simulated and real data and compared it with three state-of-the-art methods. When ABACUS was applied to type 1 and 2 diabetes data, besides observing a wide overlap with already known associations, we found a number of biologically sound pathways, which might shed light on diabetes mechanism and etiology. AVAILABILITY AND IMPLEMENTATION: ABACUS is available at http://www.dei.unipd.it/∼dicamill/pagine/Software.html.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Interpretação Estatística de Dados , Frequência do Gene , Genótipo , Técnicas de Genotipagem , Humanos , Fenótipo
6.
J Biomed Inform ; 57: 369-76, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26325295

RESUMO

The increasing prevalence of diabetes and its related complications is raising the need for effective methods to predict patient evolution and for stratifying cohorts in terms of risk of developing diabetes-related complications. In this paper, we present a novel approach to the simulation of a type 1 diabetes population, based on Dynamic Bayesian Networks, which combines literature knowledge with data mining of a rich longitudinal cohort of type 1 diabetes patients, the DCCT/EDIC study. In particular, in our approach we simulate the patient health state and complications through discretized variables. Two types of models are presented, one entirely learned from the data and the other partially driven by literature derived knowledge. The whole cohort is simulated for fifteen years, and the simulation error (i.e. for each variable, the percentage of patients predicted in the wrong state) is calculated every year on independent test data. For each variable, the population predicted in the wrong state is below 10% on both models over time. Furthermore, the distributions of real vs. simulated patients greatly overlap. Thus, the proposed models are viable tools to support decision making in type 1 diabetes.


Assuntos
Teorema de Bayes , Simulação por Computador , Mineração de Dados , Complicações do Diabetes , Diabetes Mellitus Tipo 1 , Humanos
7.
Diabetologia ; 57(8): 1611-22, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24871321

RESUMO

AIMS/HYPOTHESIS: Diabetic nephropathy is a major diabetic complication, and diabetes is the leading cause of end-stage renal disease (ESRD). Family studies suggest a hereditary component for diabetic nephropathy. However, only a few genes have been associated with diabetic nephropathy or ESRD in diabetic patients. Our aim was to detect novel genetic variants associated with diabetic nephropathy and ESRD. METHODS: We exploited a novel algorithm, 'Bag of Naive Bayes', whose marker selection strategy is complementary to that of conventional genome-wide association models based on univariate association tests. The analysis was performed on a genome-wide association study of 3,464 patients with type 1 diabetes from the Finnish Diabetic Nephropathy (FinnDiane) Study and subsequently replicated with 4,263 type 1 diabetes patients from the Steno Diabetes Centre, the All Ireland-Warren 3-Genetics of Kidneys in Diabetes UK collection (UK-Republic of Ireland) and the Genetics of Kidneys in Diabetes US Study (GoKinD US). RESULTS: Five genetic loci (WNT4/ZBTB40-rs12137135, RGMA/MCTP2-rs17709344, MAPRE1P2-rs1670754, SEMA6D/SLC24A5-rs12917114 and SIK1-rs2838302) were associated with ESRD in the FinnDiane study. An association between ESRD and rs17709344, tagging the previously identified rs12437854 and located between the RGMA and MCTP2 genes, was replicated in independent case-control cohorts. rs12917114 near SEMA6D was associated with ESRD in the replication cohorts under the genotypic model (p < 0.05), and rs12137135 upstream of WNT4 was associated with ESRD in Steno. CONCLUSIONS/INTERPRETATION: This study supports the previously identified findings on the RGMA/MCTP2 region and suggests novel susceptibility loci for ESRD. This highlights the importance of applying complementary statistical methods to detect novel genetic variants in diabetic nephropathy and, in general, in complex diseases.


Assuntos
Nefropatias Diabéticas/genética , Loci Gênicos , Predisposição Genética para Doença , Falência Renal Crônica/genética , Adulto , Teorema de Bayes , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , População Branca/genética
8.
BMC Bioinformatics ; 13 Suppl 14: S2, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23095127

RESUMO

BACKGROUND: Multifactorial diseases arise from complex patterns of interaction between a set of genetic traits and the environment. To fully capture the genetic biomarkers that jointly explain the heritability component of a disease, thus, all SNPs from a genome-wide association study should be analyzed simultaneously. RESULTS: In this paper, we present Bag of Naïve Bayes (BoNB), an algorithm for genetic biomarker selection and subjects classification from the simultaneous analysis of genome-wide SNP data. BoNB is based on the Naïve Bayes classification framework, enriched by three main features: bootstrap aggregating of an ensemble of Naïve Bayes classifiers, a novel strategy for ranking and selecting the attributes used by each classifier in the ensemble and a permutation-based procedure for selecting significant biomarkers, based on their marginal utility in the classification process. BoNB is tested on the Wellcome Trust Case-Control study on Type 1 Diabetes and its performance is compared with the ones of both a standard Naïve Bayes algorithm and HyperLASSO, a penalized logistic regression algorithm from the state-of-the-art in simultaneous genome-wide data analysis. CONCLUSIONS: The significantly higher classification accuracy obtained by BoNB, together with the significance of the biomarkers identified from the Type 1 Diabetes dataset, prove the effectiveness of BoNB as an algorithm for both classification and biomarker selection from genome-wide SNP data. AVAILABILITY: Source code of the BoNB algorithm is released under the GNU General Public Licence and is available at http://www.dei.unipd.it/~sambofra/bonb.html.


Assuntos
Algoritmos , Teorema de Bayes , Diabetes Mellitus Tipo 1/genética , Marcadores Genéticos , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Feminino , Humanos , Modelos Logísticos
9.
Eur J Endocrinol ; 178(4): 331-341, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-29371336

RESUMO

OBJECTIVE: Type 2 diabetes arises from the interaction of physiological and lifestyle risk factors. Our objective was to develop a model for predicting the risk of T2D, which could use various amounts of background information. RESEARCH DESIGN AND METHODS: We trained a survival analysis model on 8483 people from three large Finnish and Spanish data sets, to predict the time until incident T2D. All studies included anthropometric data, fasting laboratory values, an oral glucose tolerance test (OGTT) and information on co-morbidities and lifestyle habits. The variables were grouped into three sets reflecting different degrees of information availability. Scenario 1 included background and anthropometric information; Scenario 2 added routine laboratory tests; Scenario 3 also added results from an OGTT. Predictive performance of these models was compared with FINDRISC and Framingham risk scores. RESULTS: The three models predicted T2D risk with an average integrated area under the ROC curve equal to 0.83, 0.87 and 0.90, respectively, compared with 0.80 and 0.75 obtained using the FINDRISC and Framingham risk scores. The results were validated on two independent cohorts. Glucose values and particularly 2-h glucose during OGTT (2h-PG) had highest predictive value. Smoking, marital and professional status, waist circumference, blood pressure, age and gender were also predictive. CONCLUSIONS: Our models provide an estimation of patient's risk over time and outweigh FINDRISC and Framingham traditional scores for prediction of T2D risk. Of note, the models developed in Scenarios 1 and 2, only exploited variables easily available at general patient visits.


Assuntos
Glicemia/metabolismo , Diabetes Mellitus Tipo 2/sangue , Diabetes Mellitus Tipo 2/diagnóstico , Estatística como Assunto/normas , Adulto , Diabetes Mellitus Tipo 2/epidemiologia , Feminino , Finlândia/epidemiologia , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Teóricos , Valor Preditivo dos Testes , Estudos Prospectivos , Espanha/epidemiologia , Estatística como Assunto/métodos
10.
Educ Psychol Meas ; 77(5): 792-815, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29795931

RESUMO

The clinical assessment of mental disorders can be a time-consuming and error-prone procedure, consisting of a sequence of diagnostic hypothesis formulation and testing aimed at restricting the set of plausible diagnoses for the patient. In this article, we propose a novel computerized system for the adaptive testing of psychological disorders. The proposed system combines a mathematical representation of psychological disorders, known as the "formal psychological assessment," with an algorithm designed for the adaptive assessment of an individual's knowledge. The assessment algorithm is extended and adapted to the new application domain. Testing the system on a real sample of 4,324 healthy individuals, screened for obsessive-compulsive disorder, we demonstrate the system's ability to support clinical testing, both by identifying the correct critical areas for each individual and by reducing the number of posed questions with respect to a standard written questionnaire.

11.
J Diabetes Sci Technol ; 10(1): 119-24, 2015 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-26232371

RESUMO

BACKGROUND: Abnormal glucose variability (GV) is a risk factor for diabetes complications, and tens of indices for its quantification from continuous glucose monitoring (CGM) time series have been proposed. However, the information carried by these indices is redundant, and a parsimonious description of GV can be obtained through sparse principal component analysis (SPCA). We have recently shown that a set of 10 metrics selected by SPCA is able to describe more than 60% of the variance of 25 GV indicators in type 1 diabetes (T1D). Here, we want to extend the application of SPCA to type 2 diabetes (T2D). METHODS: A data set of CGM time series collected in 13 T2D subjects was considered. The 25 GV indices considered for T1D were evaluated. SPCA was used to select a subset of indices able to describe the majority of the original variance. RESULTS: A subset of 10 indicators was selected and allowed to describe 83% of the variance of the original pool of 25 indices. Four metrics sufficient to describe 67% of the original variance turned out to be shared by the parsimonious sets of indices in T1D and T2D. CONCLUSIONS: Starting from a pool of 25 indices assessed from CGM time series in T2D subjects, reduced subsets of metrics virtually providing the same information content can be determined by SPCA. The fact that these indices also appear in the parsimonious description of GV in T1D may indicate that they could be particularly informative of GV in diabetes, regardless of the specific type of disease.


Assuntos
Glicemia/análise , Diabetes Mellitus Tipo 2/sangue , Análise de Componente Principal , Adulto , Automonitorização da Glicemia , Humanos , Masculino
12.
Artigo em Inglês | MEDLINE | ID: mdl-26736707

RESUMO

In order to better understand the relations between different risk factors in the predisposition to type 2 diabetes, we present a Bayesian Network analysis of a large dataset, composed of three European population studies. Our results show, together with a key role of metabolic syndrome and of glucose after 2 hours of an Oral Glucose Tolerance Test, the importance of education, measured as the number of years of study, in the predisposition to type 2 diabetes.


Assuntos
Diabetes Mellitus Tipo 2/etiologia , Síndrome Metabólica/complicações , Modelos Estatísticos , Teorema de Bayes , Bases de Dados Factuais , Finlândia , Teste de Tolerância a Glucose , Humanos , Masculino , Fatores de Risco , Espanha , População Branca
13.
Artigo em Inglês | MEDLINE | ID: mdl-22585141

RESUMO

The systematic perturbation of the components of a biological system has been proven among the most informative experimental setups for the identification of causal relations between the components. In this paper, we present Systematic Perturbation-Qualitative Reasoning (SPQR), a novel Qualitative Reasoning approach to automate the interpretation of the results of systematic perturbation experiments. Our method is based on a qualitative abstraction of the experimental data: for each perturbation experiment, measured values of the observed variables are modeled as lower, equal or higher than the measurements in the wild type condition, when no perturbation is applied. The algorithm exploits a set of IF-THEN rules to infer causal relations between the variables, analyzing the patterns of propagation of the perturbation signals through the biological network, and is specifically designed to minimize the rate of false positives among the inferred relations. Tested on both simulated and real perturbation data, SPQR indeed exhibits a significantly higher precision than the state of the art.


Assuntos
Algoritmos , Modelos Biológicos , Transdução de Sinais
14.
Artigo em Inglês | MEDLINE | ID: mdl-22837424

RESUMO

Reverse engineering is the problem of inferring the structure of a network of interactions between biological variables from a set of observations. In this paper, we propose an optimization algorithm, called MORE, for the reverse engineering of biological networks from time series data. The model inferred by MORE is a sparse system of nonlinear differential equations, complex enough to realistically describe the dynamics of a biological system. MORE tackles separately the discrete component of the problem, the determination of the biological network topology, and the continuous component of the problem, the strength of the interactions. This approach allows us both to enforce system sparsity, by globally constraining the number of edges, and to integrate a priori information about the structure of the underlying interaction network. Experimental results on simulated and real-world networks show that the mixed discrete/continuous optimization approach of MORE significantly outperforms standard continuous optimization and that MORE is competitive with the state of the art in terms of accuracy of the inferred networks.


Assuntos
Algoritmos , Modelos Biológicos , Dinâmica não Linear
15.
PLoS One ; 7(3): e32200, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22403633

RESUMO

MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results.


Assuntos
Biologia Computacional/métodos , Algoritmos , Análise de Variância , Biomarcadores/metabolismo , Interpretação Estatística de Dados , Análise Discriminante , Doença/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Regressão , Tamanho da Amostra , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa