Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
BMC Bioinformatics ; 16: 375, 2015 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-26552868

RESUMEN

BACKGROUND: Statistical modeling of transcription factor binding sites is one of the classical fields in bioinformatics. The position weight matrix (PWM) model, which assumes statistical independence among all nucleotides in a binding site, used to be the standard model for this task for more than three decades but its simple assumptions are increasingly put into question. Recent high-throughput sequencing methods have provided data sets of sufficient size and quality for studying the benefits of more complex models. However, learning more complex models typically entails the danger of overfitting, and while model classes that dynamically adapt the model complexity to data have been developed, effective model selection is to date only possible for fully observable data, but not, e.g., within de novo motif discovery. RESULTS: To address this issue, we propose a stochastic algorithm for performing robust model selection in a latent variable setting. This algorithm yields a solution without relying on hyperparameter-tuning via massive cross-validation or other computationally expensive resampling techniques. Using this algorithm for learning inhomogeneous parsimonious Markov models, we study the degree of putative higher-order intra-motif dependencies for transcription factor binding sites inferred via de novo motif discovery from ChIP-seq data. We find that intra-motif dependencies are prevalent and not limited to first-order dependencies among directly adjacent nucleotides, but that second-order models appear to be the significantly better choice. CONCLUSIONS: The traditional PWM model appears to be indeed insufficient to infer realistic sequence motifs, as it is on average outperformed by more complex models that take into account intra-motif dependencies. Moreover, using such models together with an appropriate model selection procedure does not lead to a significant performance loss in comparison with the PWM model for any of the studied transcription factors. Hence, we find it worthwhile to recommend that any modern motif discovery algorithm should attempt to take into account intra-motif dependencies.


Asunto(s)
Algoritmos , Inmunoprecipitación de Cromatina/métodos , Biología Computacional/métodos , ADN/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Motivos de Nucleótidos/genética , Factores de Transcripción/metabolismo , Sitios de Unión , ADN/química , ADN/genética , Humanos , Modelos Teóricos , Posición Específica de Matrices de Puntuación , Unión Proteica
2.
Prev Med Rep ; 38: 102607, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38298822

RESUMEN

Smoking, alcohol consumption, obesity, and physical inactivity are key lifestyle risk factors for cancer. Previously these have been mostly examined singly or combined as an index, assuming independent and equivalent effects to cancer risk. The aim of our study was to systematically examine the joint pairwise and interactive effects of these lifestyle factors on the risk of a first solid primary cancer in a multi-cohort prospective setting. We used pooled data from seven Finnish health survey studies during 1972-2015, with 197,551 participants diagnosed with 16,373 solid malignant primary tumors during follow-up. Incidence of any cancer was analyzed separately without and with lung cancers using Poisson regression with main and interaction effects of key lifestyle factors. When excluding lung cancer, the highest risk of any cancer in men was observed for smokers with a BMI of ≥25 kg/m2 (HR 1.36, 95 % CI 1.25-1.48) and in women for smokers consuming alcohol (HR 1.22, 1.14-1.30). No statistically significant interactions between any studied risk factor pairs were observed. When including lung cancer, the highest HRs among men were observed for smokers who consume alcohol (HR 1.72, 1.57-1.89) and among women for smokers who were physically inactive (HR 1.38, 1.27-1.49). Smoking combined with other lifestyle factors at any exposure level resulted in highest pairwise risks, both in men and women. These results highlight the importance of smoking prevention, but also the importance of preventing obesity and reducing alcohol consumption.

3.
Cancer Rep (Hoboken) ; 5(11): e1612, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-35243812

RESUMEN

BACKGROUND: Several lifestyle factors are associated with an increased risk of colorectal cancer (CRC). Although lifestyle factors co-occur, in most previous studies these factors have been studied focusing upon a single risk factor or assuming independent effects between risk factors. AIM: To examine the pairwise effects and interactions of smoking, alcohol consumption, physical inactivity, and body mass index (BMI) with risk of subsequent colorectal cancer (CRC). METHODS AND RESULTS: We used METCA cohort data (pooled data from seven population-based Finnish health behavior survey studies during years 1972-2015) consisting of 171 063 women and men. Participants' smoking, alcohol consumption, physical inactivity and BMI measures were gathered, and participants were categorized into those exposed and those not exposed. The incidence of CRC was modeled by Poisson regression with main and interaction effects of key lifestyle factors. The cohort members were followed-up through register linkage to the Finnish Cancer Registry for first primary CRC case until the end of 2015. Follow-up time was 1715, 690 person years. The highest pairwise CRC risk was among male smokers who had overweight (BMI ≥ 25 kg/m2 ) (HR 1.75, 95% CI 1.36-2.26) and women who had overweight and consumed alcohol (HR 1.45, 95% CI 1.14-1.85). Overall, among men the association of lifestyle factors and CRC risk was stronger than among women. In men, both having overweight and being a smoker combined with any other adverse lifestyle factor increased CRC risk. Among women, elevated CRC risks were observed for those who were physically inactive and who consumed alcohol or had overweight. No statistically significant interactions were detected between pairs of lifestyle factors. CONCLUSIONS: This study strengthens the evidence of overweight, smoking, and alcohol consumption as CRC risk factors. Substantial protective benefits in CRC risk can be achieved by preventing smoking, maintaining BMI to <25 kg/m2 and not consuming alcohol.


Asunto(s)
Neoplasias Colorrectales , Sobrepeso , Masculino , Humanos , Femenino , Sobrepeso/epidemiología , Sobrepeso/complicaciones , Estudios Prospectivos , Neoplasias Colorrectales/epidemiología , Neoplasias Colorrectales/etiología , Neoplasias Colorrectales/prevención & control , Estilo de Vida , Índice de Masa Corporal
4.
Neural Netw ; 133: 123-131, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33212359

RESUMEN

Many applications, especially in physics and other sciences, call for easily interpretable and robust machine learning techniques. We propose a fully gradient-based technique for training radial basis function networks with an efficient and scalable open-source implementation. We derive novel closed-form optimization criteria for pruning the models for continuous as well as binary data which arise in a challenging real-world material physics problem. The pruned models are optimized to provide compact and interpretable versions of larger models based on informed assumptions about the data distribution. Visualizations of the pruned models provide insight into the atomic configurations that determine atom-level migration processes in solid matter; these results may inform future research on designing more suitable descriptors for use with machine learning algorithms.


Asunto(s)
Algoritmos , Aprendizaje Automático , Redes Neurales de la Computación , Física/métodos , Humanos
5.
PLoS One ; 11(4): e0152656, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27035667

RESUMEN

The maximum parsimony (MP) method for inferring phylogenies is widely used, but little is known about its limitations in non-asymptotic situations. This study employs large-scale computations with simulated phylogenetic data to estimate the probability that MP succeeds in finding the true phylogeny for up to twelve taxa and 256 characters. The set of candidate phylogenies are taken to be unrooted binary trees; for each simulated data set, the tree lengths of all (2n - 5)!! candidates are computed to evaluate quantities related to the performance of MP, such as the probability of finding the true phylogeny, the probability that the tree with the shortest length is unique, the probability that the true phylogeny has the shortest tree length, and the expected inverse of the number of trees sharing the shortest length. The tree length distributions are also used to evaluate and extend the skewness test of Hillis for distinguishing between random and phylogenetic data. The results indicate, for example, that the critical point after which MP achieves a success probability of at least 0.9 is roughly around 128 characters. The skewness test is found to perform well on simulated data and the study extends its scope to up to twelve taxa.


Asunto(s)
Modelos Teóricos , Filogenia , Probabilidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA