Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 601
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37539831

RESUMO

Duplex sequencing technology has been widely used in the detection of low-frequency mutations in circulating tumor deoxyribonucleic acid (DNA), but how to determine the sequencing depth and other experimental parameters to ensure the stable detection of low-frequency mutations is still an urgent problem to be solved. The mutation detection rules of duplex sequencing constrain not only the number of mutated templates but also the number of mutation-supportive reads corresponding to each forward and reverse strand of the mutated templates. To tackle this problem, we proposed a Depth Estimation model for stable detection of Low-Frequency MUTations in duplex sequencing (DELFMUT), which models the identity correspondence and quantitative relationships between templates and reads using the zero-truncated negative binomial distribution without considering the sequences composed of bases. The results of DELFMUT were verified by real duplex sequencing data. In the case of known mutation frequency and mutation detection rule, DELFMUT can recommend the combinations of DNA input and sequencing depth to guarantee the stable detection of mutations, and it has a great application value in guiding the experimental parameter setting of duplex sequencing technology.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Neoplasias/genética , Taxa de Mutação , DNA
2.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37507115

RESUMO

Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have been used to describe the distribution of gene expression across cells, a comprehensive assessment of the different models is missing. Moreover, the growing number of features associated with scRNA-seq datasets creates new challenges for analytical accuracy and computing speed. Here, we developed a Python-based package (TensorZINB) to solve the zero-inflated negative binomial (ZINB) model using the TensorFlow deep learning framework. We used a sequential initialization method to solve the numerical stability issues associated with hurdle and zero-inflated models. A recursive feature selection protocol was used to optimize feature selections for data processing and downstream differentially expressed gene (DEG) analysis. We proposed a class of hybrid models combining nested models to further improve the model's performance. Additionally, we developed a new method to convert a continuous distribution to its equivalent discrete form, so that statistical models can be fairly compared. Finally, we showed that the proposed TensorFlow algorithm (TensorZINB) was numerically stable and that its computing speed and performance were superior to those of existing ZINB solvers. Moreover, we implemented seven hurdle and zero-inflated statistical models in Python and systematically assessed their performance using a real scRNA-seq dataset. We demonstrated that the ZINB model achieved the lowest Akaike information criterion compared with other models tested. Taken together, TensorZINB was accurate, efficient and scalable for the implementation of ZINB and for large-scale scRNA-seq data analysis with DEG identification.


Assuntos
Perfilação da Expressão Gênica , Modelos Estatísticos , Distribuição de Poisson , Perfilação da Expressão Gênica/métodos , RNA , Análise de Sequência de RNA/métodos
3.
Methods ; 226: 61-70, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38631404

RESUMO

As the most abundant mRNA modification, m6A controls and influences many aspects of mRNA metabolism including the mRNA stability and degradation. However, the role of specific m6A sites in regulating gene expression still remains unclear. In additional, the multicollinearity problem caused by the correlation of methylation level of multiple m6A sites in each gene could influence the prediction performance. To address the above challenges, we propose an elastic-net regularized negative binomial regression model (called m6Aexpress-enet) to predict which m6A site could potentially regulate its gene expression. Comprehensive evaluations on simulated datasets demonstrate that m6Aexpress-enet could achieve the top prediction performance. Applying m6Aexpress-enet on real MeRIP-seq data from human lymphoblastoid cell lines, we have uncovered the complex regulatory pattern of predicted m6A sites and their unique enrichment pathway of the constructed co-methylation modules. m6Aexpress-enet proves itself as a powerful tool to enable biologists to discover the mechanism of m6A regulatory gene expression. Furthermore, the source code and the step-by-step implementation of m6Aexpress-enet is freely accessed at https://github.com/tengzhangs/m6Aexpress-enet.


Assuntos
Regulação da Expressão Gênica , RNA Mensageiro , Humanos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Regulação da Expressão Gênica/genética , Biologia Computacional/métodos , Metilação , Software , Adenosina/metabolismo , Adenosina/genética , Adenosina/análogos & derivados , Análise de Regressão
4.
Biostatistics ; 2023 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-37257175

RESUMO

In complex tissues containing cells that are difficult to dissociate, single-nucleus RNA-sequencing (snRNA-seq) has become the preferred experimental technology over single-cell RNA-sequencing (scRNA-seq) to measure gene expression. To accurately model these data in downstream analyses, previous work has shown that droplet-based scRNA-seq data are not zero-inflated, but whether droplet-based snRNA-seq data follow the same probability distributions has not been systematically evaluated. Using pseudonegative control data from nuclei in mouse cortex sequenced with the 10x Genomics Chromium system and mouse kidney sequenced with the DropSeq system, we found that droplet-based snRNA-seq data follow a negative binomial distribution, suggesting that parametric statistical models applied to scRNA-seq are transferable to snRNA-seq. Furthermore, we found that the quantification choices in adapting quantification mapping strategies from scRNA-seq to snRNA-seq can play a significant role in downstream analyses and biological interpretation. In particular, reference transcriptomes that do not include intronic regions result in significantly smaller library sizes and incongruous cell type classifications. We also confirmed the presence of a gene length bias in snRNA-seq data, which we show is present in both exonic and intronic reads, and investigate potential causes for the bias.

5.
Biometrics ; 80(3)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39073775

RESUMO

Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.


Assuntos
Teorema de Bayes , Simulação por Computador , Perfilação da Expressão Gênica , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Transcriptoma , Cadeias de Markov , Modelos Estatísticos , Interpretação Estatística de Dados
6.
Stat Med ; 43(6): 1153-1169, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38221776

RESUMO

Wastewater-based surveillance has become an important tool for research groups and public health agencies investigating and monitoring the COVID-19 pandemic and other public health emergencies including other pathogens and drug abuse. While there is an emerging body of evidence exploring the possibility of predicting COVID-19 infections from wastewater signals, there remain significant challenges for statistical modeling. Longitudinal observations of viral copies in municipal wastewater can be influenced by noisy datasets and missing values with irregular and sparse samplings. We propose an integrative Bayesian framework to predict daily positive cases from weekly wastewater observations with missing values via functional data analysis techniques. In a unified procedure, the proposed analysis models severe acute respiratory syndrome coronavirus-2 RNA wastewater signals as a realization of a smooth process with error and combines the smooth process with COVID-19 cases to evaluate the prediction of positive cases. We demonstrate that the proposed framework can achieve these objectives with high predictive accuracies through simulated and observed real data.


Assuntos
COVID-19 , Humanos , Teorema de Bayes , COVID-19/epidemiologia , Pandemias , RNA Viral/genética , SARS-CoV-2/genética , Águas Residuárias
7.
BMC Infect Dis ; 24(1): 1006, 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39300391

RESUMO

BACKGROUND: It is difficult to detect the outbreak of emergency infectious disease based on the exiting surveillance system. Here we investigate the utility of the Baidu Search Index, an indicator of how large of a keyword is in Baidu's search volume, in the early warning and predicting the epidemic trend of COVID-19. METHODS: The daily number of cases and the Baidu Search Index of 8 keywords (weighted by population) from December 1, 2019 to March 15, 2020 were collected and analyzed with times series and Spearman correlation with different time lag. To predict the daily number of COVID-19 cases using the Baidu Search Index, Zero-inflated negative binomial regression was used in phase 1 and negative binomial regression model was used in phase 2 and phase 3 based on the characteristic of independent variable. RESULTS: The Baidu Search Index of all keywords in Wuhan was significantly higher than Hubei (excluded Wuhan) and China (excluded Hubei). Before the causative pathogen was identified, the search volume of "Influenza" and "Pneumonia" in Wuhan increased with the number of new onset cases, their correlation coefficient was 0.69 and 0.59, respectively. After the pathogen was public but before COVID-19 was classified as a notifiable disease, the search volume of "SARS", "Pneumonia", "Coronavirus" in all study areas increased with the number of new onset cases with the correlation coefficient was 0.69 ~ 0.89, while "Influenza" changed to negative correlated (rs: -0.56 ~ -0.64). After COVID-19 was closely monitored, the Baidu Search Index of "COVID-19", "Pneumonia", "Coronavirus", "SARS" and "Mask" could predict the epidemic trend with 15 days, 5 days and 6 days lead time, respectively in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei). The predicted number of cases would increase 1.84 and 4.81 folds, respectively than the actual number of cases in Wuhan and Hubei (excluded Wuhan) from 21 January to 9 February. CONCLUSION: The Baidu Search Index could be used in the early warning and predicting the epidemic trend of COVID-19, but the search keywords changed in different period. Considering the time lag from onset to diagnosis, especially in the areas with medical resources shortage, internet search data can be a highly effective supplement of the existing surveillance system.


Assuntos
COVID-19 , Surtos de Doenças , Monitoramento Epidemiológico , Modelos Estatísticos , Análise de Regressão , Ferramenta de Busca , Humanos , COVID-19/epidemiologia , China/epidemiologia , Fatores de Tempo , SARS-CoV-2/fisiologia
8.
BMC Infect Dis ; 24(1): 262, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38408924

RESUMO

BACKGROUND: Widespread human-to-human transmission of the severe acute respiratory syndrome coronavirus two (SARS-CoV-2) stems from a strong affinity for the cellular receptor angiotensin converting enzyme two (ACE2). We investigate the relationship between a patient's nasopharyngeal ACE2 transcription and secondary transmission within a series of concurrent hospital associated SARS-CoV-2 outbreaks in British Columbia, Canada. METHODS: Epidemiological case data from the outbreak investigations was merged with public health laboratory records and viral lineage calls, from whole genome sequencing, to reconstruct the concurrent outbreaks using infection tracing transmission network analysis. ACE2 transcription and RNA viral load were measured by quantitative real-time polymerase chain reaction. The transmission network was resolved to calculate the number of potential secondary cases. Bivariate and multivariable analyses using Poisson and Negative Binomial regression models was performed to estimate the association between ACE2 transcription the number of SARS-CoV-2 secondary cases. RESULTS: The infection tracing transmission network provided n = 76 potential transmission events across n = 103 cases. Bivariate comparisons found that on average ACE2 transcription did not differ between patients and healthcare workers (P = 0.86). High ACE2 transcription was observed in 98.6% of transmission events, either the primary or secondary case had above average ACE2. Multivariable analysis found that the association between ACE2 transcription (log2 fold-change) and the number of secondary transmission events differs between patients and healthcare workers. In health care workers Negative Binomial regression estimated that a one-unit change in ACE2 transcription decreases the number of secondary cases (ß = -0.132 (95%CI: -0.255 to -0.0181) adjusting for RNA viral load. Conversely, in patients a one-unit change in ACE2 transcription increases the number of secondary cases (ß = 0.187 (95% CI: 0.0101 to 0.370) adjusting for RNA viral load. Sensitivity analysis found no significant relationship between ACE2 and secondary transmission in health care workers and confirmed the positive association among patients. CONCLUSION: Our study suggests that ACE2 transcription has a positive association with SARS-CoV-2 secondary transmission in admitted inpatients, but not health care workers in concurrent hospital associated outbreaks, and it should be further investigated as a risk-factor for viral transmission.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Enzima de Conversão de Angiotensina 2 , Colúmbia Britânica/epidemiologia , COVID-19/epidemiologia , Surtos de Doenças , Hospitais , RNA , SARS-CoV-2/genética
9.
J Urban Health ; 101(3): 571-583, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38831155

RESUMO

Mass shootings (incidents with four or more people shot in a single event, not including the shooter) are becoming more frequent in the United States, posing a significant threat to public health and safety in the country. In the current study, we intended to analyze the impact of state-level prevalence of gun ownership on mass shootings-both the frequency and severity of these events. We applied the negative binomial generalized linear mixed model to investigate the association between gun ownership rate, as measured by a proxy (i.e., the proportion of suicides committed with firearms to total suicides), and population-adjusted rates of mass shooting incidents and fatalities at the state level from 2013 to 2022. Gun ownership was found to be significantly associated with the rate of mass shooting fatalities. Specifically, our model indicated that for every 1-SD increase-that is, for every 12.5% increase-in gun ownership, the rate of mass shooting fatalities increased by 34% (p value < 0.001). However, no significant association was found between gun ownership and rate of mass shooting incidents. These findings suggest that restricting gun ownership (and therefore reducing availability to guns) may not decrease the number of mass shooting events, but it may save lives when these events occur.


Assuntos
Armas de Fogo , Incidentes com Feridos em Massa , Propriedade , Suicídio , Humanos , Armas de Fogo/estatística & dados numéricos , Estados Unidos/epidemiologia , Propriedade/estatística & dados numéricos , Incidentes com Feridos em Massa/estatística & dados numéricos , Suicídio/estatística & dados numéricos , Ferimentos por Arma de Fogo/epidemiologia , Ferimentos por Arma de Fogo/mortalidade , Eventos de Tiroteio em Massa
10.
Pharmacoepidemiol Drug Saf ; 33(2): e5750, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38362649

RESUMO

PURPOSE: Outcome variables that are assumed to follow a negative binomial distribution are frequently used in both clinical and epidemiological studies. Epidemiological studies, particularly those performed by pharmaceutical companies often aim to describe a population rather than compare treatments. Such descriptive studies are often analysed using confidence intervals. While precision calculations and sample size calculations are not always performed in these settings, they have the important role of setting expectations of what results the study may generate. Current methods for precision calculations for the negative binomial rate are based on plugging in parameter values into the confidence interval formulae. This method has the downside of ignoring the randomness of the confidence interval limits. To enable better practice for precision calculations, methods are needed that address the randomness. METHODS: Using the well-known delta-method we develop a method for calculating the precision probability, that is, the probability of achieving a certain width. We assess the performance of the method in smaller samples through simulations. RESULTS: The method for the precision probability performs well in small to medium sample sizes, and the usefulness of the method is demonstrated through an example. CONCLUSIONS: We have developed a simple method for calculating the precision probability for negative binomial rates. This method can be used when planning epidemiological studies in for example, asthma, while correctly taking the randomness of confidence intervals into account.


Assuntos
Modelos Estatísticos , Humanos , Tamanho da Amostra , Probabilidade , Distribuição Binomial , Intervalos de Confiança
11.
Bull Math Biol ; 86(11): 131, 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39311987

RESUMO

In this work, we obtained a general formulation for the mating probability and fertile egg production in helminth parasites, focusing on the reproductive behavior of polygamous parasites and its implications for transmission dynamics. By exploring various reproductive variables in parasites with density-dependent fecundity, such as helminth parasites, we departed from the traditional assumptions of Poisson and negative binomial distributions to adopt an arbitrary distribution model. Our analysis considered critical factors such as mating probability, fertile egg production, and the distribution of female and male parasites among hosts, whether they are distributed together or separately. We show that the distribution of parasites within hosts significantly influences transmission dynamics, with implications for parasite persistence and, therefore, with implications in parasite control. Using statistical models and empirical data from Monte Carlo simulations, we provide insights into the complex interplay of reproductive variables in helminth parasites, enhancing our understanding of parasite dynamics and the transmission of parasitic diseases.


Assuntos
Helmintos , Interações Hospedeiro-Parasita , Conceitos Matemáticos , Modelos Biológicos , Método de Monte Carlo , Animais , Feminino , Helmintos/fisiologia , Masculino , Interações Hospedeiro-Parasita/fisiologia , Fertilidade/fisiologia , Simulação por Computador , Reprodução/fisiologia , Comportamento Sexual Animal/fisiologia , Probabilidade , Óvulo/fisiologia , Humanos
12.
BMC Public Health ; 24(1): 135, 2024 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-38195488

RESUMO

BACKGROUND: It is believed that the COVID-19 pandemic might disrupt routine healthcare services. A vulnerable group such as cross-border migrants is of critical concern if the pandemic affects their service utilisation. In this study, we aim to explore the impact of COVID-19 and other related factors on non-COVID-19 service amongst cross-border migrants in Thailand. METHODS: We conducted an ecological time-series cross-sectional analysis using secondary data from 2019 to 2022, focusing on insured and non-insured migrants in a unit of a provincial monthly quarter. We obtained data on registered migrants from the Ministry of Labour and inpatient visits from the Ministry of Public Health (MOPH). Our analysis involved descriptive statistics and a random-effects negative binomial regression, considering variables such as COVID-19 cases, number of hospital beds, registered regions, and COVID-19 waves. We assessed inpatient utilisation number and rate as our primary outcomes. RESULTS: The admission numbers for insured and non-insured migrants in all regions increased 1.3-2.1 times after 2019 despite a decrease in the numbers of registered migrants. The utilisation of services for selected communicable and non-communicable diseases and obstetric conditions remained consistent throughout 2019-2022. The admission numbers and rates were not associated with an increase in COVID-19 incidence cases but significantly enlarged as time passed by compared to the pre-COVID-19 period (44.5-77.0% for insured migrants and 15.0-26.4% for non-insured migrants). Greater Bangkok saw the lowest admission rate amongst insured migrants, reflected by the incidence rate ratio of 5.7-27.5 relative to other regions. CONCLUSION: The admission numbers and rates for non-COVID-19 healthcare services remained stable regardless of COVID-19 incidence. The later pandemic waves (Delta and Omicron variants) were related to an increase in admission numbers and rates, possibly due to disruptions in outpatient care, leading to more severe cases seeking hospitalisation. Lower admission rates in Greater Bangkok may be linked to the fragmentation of the primary care network in major cities and the disintegration of service utilisation data between private facilities and the MOPH. Future research should explore migrant healthcare-seeking behaviour at an individual level, using both quantitative and qualitative methods for deeper insights.


Assuntos
COVID-19 , Migrantes , Feminino , Gravidez , Humanos , Logradouros Públicos , Tailândia/epidemiologia , Estudos Transversais , Pandemias , COVID-19/epidemiologia , Atenção à Saúde
13.
Parasitol Res ; 123(9): 329, 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39316149

RESUMO

Aggregation is a fundamental feature of parasite distribution in the host population, but the biological implications of the aggregation indices used to describe the relationships between the populations of parasites and hosts are not evident. It is speculated that the form of distribution in each case is predicated on the host's varying resistance to the infection, which is hard to control, making it difficult to adequately interpret the index values. This paper examines several cases from trout farms in Russian Karelia to explore the monogenean Gyrodactylus spp. infection in rainbow trout of varying ages. The genetic homogeneity of cage-reared fish and the direct life cycle of the helminths make the relationship between the species more lucid than in natural host-parasite systems. The results give no ground to speak of any specific patterns: as well as in the natural systems, the infection rates in trout vary widely, i.e., the helminth distribution has not become more uniform; the observed distributions in all cases are adequately approximated by the negative binomial model; the positive abundance-occupancy relationships (AORs) and abundance-variance relationships (AVRs) common for parasitic systems apply to the basic infection parameters. The form of the negative binomial distribution is shaped by two parameters-k and θ, the former being a metric of the infection variability, which depends on the host's individual resistance, and the latter representing the parasites' reproduction and establishment success rates. A rise in the parameter k indicates increased aggregation and a higher parameter θ points to a more uniform frequency distribution. These parameters can be used as a representative tool for monitoring the parasite communities in salmonid fishes, including in aquaculture.


Assuntos
Doenças dos Peixes , Interações Hospedeiro-Parasita , Oncorhynchus mykiss , Trematódeos , Infecções por Trematódeos , Animais , Oncorhynchus mykiss/parasitologia , Doenças dos Peixes/parasitologia , Infecções por Trematódeos/veterinária , Infecções por Trematódeos/parasitologia , Trematódeos/fisiologia , Trematódeos/genética , Trematódeos/classificação , Trematódeos/isolamento & purificação , Federação Russa , Platelmintos/fisiologia , Platelmintos/genética , Platelmintos/classificação
14.
Pharm Stat ; 23(1): 46-59, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38267827

RESUMO

Count outcomes are collected in clinical trials for new drug development in several therapeutic areas and the event rate is commonly used as a single primary endpoint. Count outcomes that are greater than the mean value are termed overdispersion; thus, count outcomes are assumed to have a negative binomial distribution. However, in clinical trials for treating asthma and chronic obstructive pulmonary disease (COPD), a regulatory agency has suggested that a continuous endpoint related to lung function must be evaluated as a primary endpoint in addition to the event rate. The two co-primary endpoints that need to be evaluated include overdispersed count and continuous outcomes. Some researchers have proposed sample size calculation methods in the context of co-primary endpoints for various outcome types. However, methodologies for sample size calculation in trials with two co-primary endpoints, including overdispersed count and continuous outcomes, required when planning clinical trials for treating asthma and COPD, remain to be proposed. In this study, we aimed to develop a hypothesis-testing method and a corresponding sample size calculation method with two co-primary endpoints including overdispersed count and continuous outcomes. In a simulation, we demonstrated that the proposed sample size calculation method has adequate power accuracy. In addition, we illustrated an application of the proposed sample size calculation method to a placebo-controlled Phase 3 trial for patients with COPD.


Assuntos
Asma , Doença Pulmonar Obstrutiva Crônica , Humanos , Tamanho da Amostra , Asma/tratamento farmacológico , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Doença Pulmonar Obstrutiva Crônica/tratamento farmacológico , Distribuição Binomial , Simulação por Computador
15.
Sichuan Da Xue Xue Bao Yi Xue Ban ; 55(4): 918-924, 2024 Jul 20.
Artigo em Zh | MEDLINE | ID: mdl-39170018

RESUMO

Objective: To construct a model for predicting recidivism in violence in community-based schizophrenia spectrum disorder patients (SSDP) by adopting a joint modeling method. Methods: Based on the basic data on severe mental illness in Southwest China between January 2017 and June 2018, 4565 community-based SSDP with baseline violent behaviors were selected as the research subjects. We used a growth mixture model (GMM) to identify patterns of medication adherence and social functioning. We then fitted the joint model using a zero-inflated negative binomial regression model and compared it with traditional static models. Finally, we used a 10-fold training-test cross validation framework to evaluate the models' fitting and predictive performance. Results: A total of 157 patients (3.44%) experienced recidivism in violence. Medication compliance and social functioning were fitted into four patterns. In the counting model, age, marital status, educational attainment, economic status, historical types of violence, and medication compliance patterns were predictive factors for the frequency of recidivism of violence (P<0.05). In the zero-inflated model, age, adverse drug reactions, historical types of violence, medication compliance patterns, and social functioning patterns were predictive factors for the recidivism in violence (P<0.05). For the joint model, the average value of Akaike information criterion (AIC) for the train set was 776.5±9.4, the average value of root mean squared error (RMSE) for the testing set was 0.168±0.013, and the average value of mean absolute error (MAE) for the testing set was 0.131±0.018, which were all lower than those of the traditional static models. Conclusion: Joint modeling is an effective statistical strategy for identifying and processing dynamic variables, exhibiting better predictive performance than that of the traditional static models. It can provide new ideas for promoting the construction of comprehensive intervention systems.


Assuntos
Reincidência , Esquizofrenia , Violência , Humanos , Esquizofrenia/tratamento farmacológico , China , Violência/estatística & dados numéricos , Reincidência/estatística & dados numéricos , Feminino , Masculino , Adesão à Medicação/estatística & dados numéricos , Adulto , Pessoa de Meia-Idade
16.
BMC Bioinformatics ; 24(1): 318, 2023 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-37608264

RESUMO

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis. RESULTS: We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from the same tissue and disease showed that the results of iDESC achieved the best consistency between datasets and the best disease relevance. CONCLUSIONS: iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects.


Assuntos
Modelos Estatísticos , Transcriptoma , Humanos , Análise de Sequência de RNA
17.
BMC Bioinformatics ; 24(1): 187, 2023 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-37158829

RESUMO

BACKGROUND: The spectrum of mutations in a collection of cancer genomes can be described by a mixture of a few mutational signatures. The mutational signatures can be found using non-negative matrix factorization (NMF). To extract the mutational signatures we have to assume a distribution for the observed mutational counts and a number of mutational signatures. In most applications, the mutational counts are assumed to be Poisson distributed, and the rank is chosen by comparing the fit of several models with the same underlying distribution and different values for the rank using classical model selection procedures. However, the counts are often overdispersed, and thus the Negative Binomial distribution is more appropriate. RESULTS: We propose a Negative Binomial NMF with a patient specific dispersion parameter to capture the variation across patients and derive the corresponding update rules for parameter estimation. We also introduce a novel model selection procedure inspired by cross-validation to determine the number of signatures. Using simulations, we study the influence of the distributional assumption on our method together with other classical model selection procedures. We also present a simulation study with a method comparison where we show that state-of-the-art methods are highly overestimating the number of signatures when overdispersion is present. We apply our proposed analysis on a wide range of simulated data and on two real data sets from breast and prostate cancer patients. On the real data we describe a residual analysis to investigate and validate the model choice. CONCLUSIONS: With our results on simulated and real data we show that our model selection procedure is more robust at determining the correct number of signatures under model misspecification. We also show that our model selection procedure is more accurate than the available methods in the literature for finding the true number of signatures. Lastly, the residual analysis clearly emphasizes the overdispersion in the mutational count data. The code for our model selection procedure and Negative Binomial NMF is available in the R package SigMoS and can be found at https://github.com/MartaPelizzola/SigMoS .


Assuntos
Algoritmos , Mama , Masculino , Humanos , Mutação , Distribuição Binomial , Simulação por Computador
18.
BMC Genomics ; 24(1): 349, 2023 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-37365517

RESUMO

T cell receptor repertoires can be profiled using next generation sequencing (NGS) to measure and monitor adaptive dynamical changes in response to disease and other perturbations. Genomic DNA-based bulk sequencing is cost-effective but necessitates multiplex target amplification using multiple primer pairs with highly variable amplification efficiencies. Here, we utilize an equimolar primer mixture and propose a single statistical normalization step that efficiently corrects for amplification bias post sequencing. Using samples analyzed by both our open protocol and a commercial solution, we show high concordance between bulk clonality metrics. This approach is an inexpensive and open-source alternative to commercial solutions.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Linfócitos T , Sequência de Bases , Mapeamento Cromossômico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Receptores de Antígenos de Linfócitos T alfa-beta/genética
19.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33152752

RESUMO

Time-course RNAseq experiments, where tissues are repeatedly collected from the same subjects, e.g. humans or animals over time or under several different experimental conditions, are becoming more popular due to the reducing sequencing costs. Such designs offer the great potential to identify genes that change over time or progress differently in time across experimental groups. Modelling of the longitudinal gene expression in such time-course RNAseq data is complicated by the serial correlations, missing values due to subject dropout or sequencing errors, long follow up with potentially non-linear progression in time and low number of subjects. Negative Binomial mixed models can address all these issues. However, such models under the maximum likelihood (ML) approach are less popular for RNAseq data due to convergence issues (see, e.g. [1]). We argue in this paper that it is the use of an inaccurate numerical integration method in combination with the typically small sample sizes which causes such mixed models to fail for a great portion of tested genes. We show that when we use the accurate adaptive Gaussian quadrature approach to approximate the integrals over the random-effects terms, we can successfully estimate the model parameters with the maximum likelihood method. Moreover, we show that the boostrap method can be used to preserve the type I error rate in small sample settings. We evaluate empirically the small sample properties of the test statistics and compare with state-of-the-art approaches. The method is applied on a longitudinal mice experiment to study the dynamics in Duchenne Muscular Dystrophy. Contact:s.tsonaka@lumc.nl Roula Tsonaka is an assistant professor at the Medical Statistics, Department of Biomedical Data Sciences, Leiden University Medical Center. Her research focuses on statistical methods for longitudinal omics data. Pietro Spitali is an assistant professor at the Department of Human Genetics, Leiden University Medical Center. His research focuses on the identification of biomarkers for neuromuscular disorders.


Assuntos
Regulação da Expressão Gênica , Modelos Genéticos , Distrofia Muscular de Duchenne , RNA-Seq , Animais , Modelos Animais de Doenças , Humanos , Camundongos , Modelos Estatísticos , Distrofia Muscular de Duchenne/genética , Distrofia Muscular de Duchenne/metabolismo
20.
BMC Cancer ; 23(1): 293, 2023 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-37004010

RESUMO

BACKGROUND: This cross-sectional cohort study assessed the inequalities in oesophageal carcinoma risk by age, sex and nativity in Kuwait: 1980-2019. METHODS: Using oesophageal cancer incidence data from the Kuwait National Cancer Registry, relevant Kuwaiti population data and World Standard Population as a reference, age-standardized incidence rates (ASIR) (per 100,000 person-years) overall and by subcohorts were computed. The incident oesophageal cancer cases count was overdispersed with excessive structural zeros, therefore, it was analyzed using multivariable zero-inflated negative binomial (ZINB) model. RESULTS: Overall ASIR of oesophageal cancer was 10.51 (95% CI:  6.62-14.41). The multivariable ZINB model showed that compared with the younger age category (< 30 years), the individuals in higher age groups showed a significant (p < 0.001) increasing tendency to develop the oesophageal cancer.  Furthermore, compared with the non-Kuwaiti residents, the Kuwaiti nationals were significantly (p < 0.001) more likely to develop oesophageal cancer during the study period. Moreover, compared with 1980-84 period, ASIRs steadily and significantly  (p < 0.005) declined in subsequent periods till 2015-19. CONCLUSIONS: A high incidence of oesophageal cancer was recorded in Kuwait, which consistently declined from 1980 to 2019. Older adults (aged ≥ 60 years) and, Kuwaiti nationals were at high risk of oesophageal cancer. Focused educational intervention may minimize oesophageal cancer incidence in high-risk groups in this and other similar settings. Future studies may contemplate to evaluate such an intervention.


Assuntos
Carcinoma , Neoplasias Esofágicas , Humanos , Idoso , Estudos Transversais , Incidência , Kuweit/epidemiologia , Neoplasias Esofágicas/epidemiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA