Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 425
Filtrar
1.
Stat Biosci ; 16(2): 321-346, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39091460

RESUMEN

Estimating sample size and statistical power is an essential part of a good epidemiological study design. Closed-form formulas exist for simple hypothesis tests but not for advanced statistical methods designed for exposure mixture studies. Estimating power with Monte Carlo simulations is flexible and applicable to these methods. However, it is not straightforward to code a simulation for non-experienced programmers and is often hard for a researcher to manually specify multivariate associations among exposure mixtures to set up a simulation. To simplify this process, we present the R package mpower for power analysis of observational studies of environmental exposure mixtures involving recently-developed mixtures analysis methods. The components within mpower are also versatile enough to accommodate any mixtures methods that will developed in the future. The package allows users to simulate realistic exposure data and mixed-typed covariates based on public data set such as the National Health and Nutrition Examination Survey or other existing data set from prior studies. Users can generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This paper presents tutorials and examples of power analysis using mpower.

2.
Methods Mol Biol ; 2818: 229-238, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39126478

RESUMEN

Immunofluorescent staining is commonly used to generate images to characterize cytological phenotypes. The manual quantification of DNA double-strand breaks and their repair intermediates during meiosis using image data requires a series of subjective steps, from image selection to the counting of particular events per nucleus. Here we describe "synapsis," a bioconductor package, which includes a set of functions to automate the process of identifying meiotic nuclei and quantifying key double-strand break formation and repair events in a rapid, scalable, and reproducible workflow, and compare it to manual user quantification. The software can be extended for other applications in meiosis research, such as incorporating machine learning approaches to categorize meiotic substages.


Asunto(s)
Emparejamiento Cromosómico , Roturas del ADN de Doble Cadena , Reparación del ADN , Meiosis , Programas Informáticos , Intercambio Genético , Humanos , Procesamiento de Imagen Asistido por Computador/métodos
3.
BMC Med Res Methodol ; 24(1): 169, 2024 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-39103781

RESUMEN

BACKGROUND: Although aggregate data (AD) from randomised clinical trials (RCTs) are used in the majority of network meta-analyses (NMAs), other study designs (e.g., cohort studies and other non-randomised studies, NRS) can be informative about relative treatment effects. The individual participant data (IPD) of the study, when available, are preferred to AD for adjusting for important participant characteristics and to better handle heterogeneity and inconsistency in the network. RESULTS: We developed the R package crossnma to perform cross-format (IPD and AD) and cross-design (RCT and NRS) NMA and network meta-regression (NMR). The models are implemented as Bayesian three-level hierarchical models using Just Another Gibbs Sampler (JAGS) software within the R environment. The R package crossnma includes functions to automatically create the JAGS model, reformat the data (based on user input), assess convergence and summarize the results. We demonstrate the workflow within crossnma by using a network of six trials comparing four treatments. CONCLUSIONS: The R package crossnma enables the user to perform NMA and NMR with different data types in a Bayesian framework and facilitates the inclusion of all types of evidence recognising differences in risk of bias.


Asunto(s)
Teorema de Bayes , Metaanálisis en Red , Programas Informáticos , Humanos , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Ensayos Clínicos Controlados Aleatorios como Asunto/estadística & datos numéricos , Proyectos de Investigación , Algoritmos , Metaanálisis como Asunto
4.
Comput Struct Biotechnol J ; 23: 2798-2810, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39055398

RESUMEN

The widespread use of high-throughput sequencing technologies has revolutionized the understanding of biology and cancer heterogeneity. Recently, several machine-learning models based on transcriptional data have been developed to accurately predict patients' outcome and clinical response. However, an open-source R package covering state-of-the-art machine-learning algorithms for user-friendly access has yet to be developed. Thus, we proposed a flexible computational framework to construct a machine learning-based integration model with elegant performance (Mime). Mime streamlines the process of developing predictive models with high accuracy, leveraging complex datasets to identify critical genes associated with prognosis. An in silico combined model based on de novo PIEZO1-associated signatures constructed by Mime demonstrated high accuracy in predicting the outcomes of patients compared with other published models. Furthermore, the PIEZO1-associated signatures could also precisely infer immunotherapy response by applying different algorithms in Mime. Finally, SDC1 selected from the PIEZO1-associated signatures demonstrated high potential as a glioma target. Taken together, our package provides a user-friendly solution for constructing machine learning-based integration models and will be greatly expanded to provide valuable insights into current fields. The Mime package is available on GitHub (https://github.com/l-magnificence/Mime).

5.
Front Med (Lausanne) ; 11: 1356323, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39055695

RESUMEN

Continuous medical and safety monitoring of subject data during a clinical trial is a critical part of evaluating the safety of trial participants and as such is governed by protocol procedures and regulatory guidelines to meet the trial's intended objectives. We present an open-source validated graphical tool (clinDataReview R package) which provides access to the trial data with drill-down to individual patient profiles. The tool incorporates functionalities that facilitate detection of error and data inconsistencies requiring follow-up. It supports regular medical monitoring and oversight as well as safety monitoring committees with interactive tables and listings alongside graphical visualizations of the primary safety data in reports. An implementation example is given where the tool is used to deliver validated outputs following FDA/EMA guidelines. As such, this tool enables a more efficient, interactive, and reproducible review of safety data collected during an ongoing clinical trial.

6.
AoB Plants ; 16(4): plae035, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39040093

RESUMEN

The analysis of photosynthetic traits has become an integral part of plant (eco-)physiology. Many of these characteristics are not directly measured, but calculated from combinations of several, more direct, measurements. The calculations of such derived variables are based on underlying physical models and may use additional constants or assumed values. Commercially available gas-exchange instruments typically report such derived variables, but the available implementations use different definitions and assumptions. Moreover, no software is currently available to allow a fully scripted and reproducible workflow that includes importing data, pre-processing and recalculating derived quantities. The R package gasanalyzer aims to address these issues by providing methods to import data from different instruments, by translating photosynthetic variables to a standardized nomenclature, and by optionally recalculating derived quantities using standardized equations. In addition, the package facilitates performing sensitivity analyses on variables or assumptions used in the calculations to allow researchers to better assess the robustness of the results. The use of the package and how to perform sensitivity analyses are demonstrated using three different examples.

7.
BMC Ecol Evol ; 24(1): 99, 2024 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-39026190

RESUMEN

BACKGROUND: Inbreeding and relationship coefficients are essential for conservation and breeding programs. Whether dealing with a small conserved population or a large commercial population, monitoring the inbreeding rate and designing mating plans that minimize the inbreeding rate and maximize the effective population size is important. Free, open-source, and efficient software may greatly contribute to conservation and breeding programs and help students and researchers. Efficient methods exist for calculating inbreeding coefficients. Therefore, an efficient way of calculating the numerator relationship coefficients is via the inbreeding coefficients. i.e., the relationship coefficient between parents is twice the inbreeding coefficient of their progeny. A dummy progeny is introduced where no progeny exists for a pair of individuals. Calculating inbreeding coefficients is very fast, and finding whether a pair of individuals has a progeny and picking one from multiple progenies is computationally more demanding. Therefore, the R package introduces a dummy progeny for any pair of individuals whose relationship coefficient is of interest, whether they have a progeny or not. RESULTS: Runtime and peak memory usage were benchmarked for calculating relationship coefficients between two sets of 250 and 800 animals (200,000 dummy progenies) from a pedigree of 2,721,252 animals. The program performed efficiently (200,000 relationship coefficients, which involved calculating 2,721,252 + 200,000 inbreeding coefficients) within 3:45 (mm:ss). Providing the inbreeding coefficients (for real animals), the runtime was reduced to 1:08. Furthermore, providing the diagonal elements of D in A = TDT ' (d), the runtime was reduced to 54s. All the analyses were performed on a machine with a total memory size of 1 GB. CONCLUSIONS: The R package FnR is free and open-source software with implications in conservation and breeding programs. It proved to be time and memory efficient for large populations and many dummy progenies. Calculation of inbreeding coefficients can be resumed for new animals in the pedigree. Thus, saving the latest inbreeding coefficient estimates is recommended. Calculation of d coefficients (from scratch) was very fast, and there was limited value in storing those for future use.


Asunto(s)
Endogamia , Programas Informáticos , Endogamia/métodos , Animales , Linaje , Masculino , Femenino
8.
BMC Med Res Methodol ; 24(1): 147, 2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003440

RESUMEN

BACKGROUND: Decision analytic models and meta-analyses often rely on survival probabilities that are digitized from published Kaplan-Meier (KM) curves. However, manually extracting these probabilities from KM curves is time-consuming, expensive, and error-prone. We developed an efficient and accurate algorithm that automates extraction of survival probabilities from KM curves. METHODS: The automated digitization algorithm processes images from a JPG or PNG format, converts them in their hue, saturation, and lightness scale and uses optical character recognition to detect axis location and labels. It also uses a k-medoids clustering algorithm to separate multiple overlapping curves on the same figure. To validate performance, we generated survival plots form random time-to-event data from a sample size of 25, 50, 150, and 250, 1000 individuals split into 1,2, or 3 treatment arms. We assumed an exponential distribution and applied random censoring. We compared automated digitization and manual digitization performed by well-trained researchers. We calculated the root mean squared error (RMSE) at 100-time points for both methods. The algorithm's performance was also evaluated by Bland-Altman analysis for the agreement between automated and manual digitization on a real-world set of published KM curves. RESULTS: The automated digitizer accurately identified survival probabilities over time in the simulated KM curves. The average RMSE for automated digitization was 0.012, while manual digitization had an average RMSE of 0.014. Its performance was negatively correlated with the number of curves in a figure and the presence of censoring markers. In real-world scenarios, automated digitization and manual digitization showed very close agreement. CONCLUSIONS: The algorithm streamlines the digitization process and requires minimal user input. It effectively digitized KM curves in simulated and real-world scenarios, demonstrating accuracy comparable to conventional manual digitization. The algorithm has been developed as an open-source R package and as a Shiny application and is available on GitHub: https://github.com/Pechli-Lab/SurvdigitizeR and https://pechlilab.shinyapps.io/SurvdigitizeR/ .


Asunto(s)
Algoritmos , Humanos , Estimación de Kaplan-Meier , Análisis de Supervivencia , Probabilidad
9.
Artículo en Inglés | MEDLINE | ID: mdl-39043402

RESUMEN

OBJECTIVES: Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies. TARGET AUDIENCE: All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package. SCOPE: We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.

10.
Methods Mol Biol ; 2811: 123-135, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39037654

RESUMEN

High-throughput transcriptome RNA sequencing is a powerful tool for understanding dynamic biological processes. Here, we present a computational framework, implemented in an R package QDSWorkflow, to characterize heterogeneous cellular dormancy depth using RNA-sequencing data from bulk samples and single cells.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biología Computacional/métodos , Transcriptoma , Perfilación de la Expresión Génica/métodos , Humanos , Análisis de la Célula Individual/métodos
11.
Front Neurol ; 15: 1393022, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38846044

RESUMEN

Purpose: The prevalence of comorbid pain and Bipolar Disorder in clinical practice continues to be high, with an increasing number of related publications. However, no study has used bibliometric methods to analyze the research progress and knowledge structure in this field. Our research is dedicated to systematically exploring the global trends and focal points in scientific research on pain comorbidity with bipolar disorder from 2003 to 2023, with the goal of contributing to the field. Methods: Relevant publications in this field were retrieved from the Web of Science core collection database (WOSSCC). And we used VOSviewer, CiteSpace, and the R package "Bibliometrix" for bibliometric analysis. Results: A total of 485 publications (including 360 articles and 125 reviews) from 66 countries, 1019 institutions, were included in this study. Univ Toront and Kings Coll London are the leading research institutions in this field. J Affect Disorders contributed the largest number of articles, and is the most co-cited journal. Of the 2,537 scholars who participated in the study, Stubbs B, Vancampfort D, and Abdin E had the largest number of articles. Stubbs B is the most co-cited author. "chronic pain," "neuropathic pain," "psychological pain" are the keywords in the research. Conclusion: This is the first bibliometric analysis of pain-related bipolar disorder. There is growing interest in the area of pain and comorbid bipolar disorder. Focusing on different types of pain in bipolar disorder and emphasizing pain management in bipolar disorder are research hotspots and future trends. The study of pain related bipolar disorder still has significant potential for development, and we look forward to more high-quality research in the future.

12.
Front Med (Lausanne) ; 11: 1409534, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38841589

RESUMEN

Purpose: Osteoporosis represents a profound challenge to public health, underscoring the critical need to dissect its complex etiology and identify viable targets for intervention. Within this context, the gut microbiota has emerged as a focal point of research due to its profound influence on bone metabolism. Despite this growing interest, the literature has yet to see a bibliometric study addressing the gut microbiota's contribution to both the development and management of osteoporosis. This study aims to fill this gap through an exhaustive bibliometric analysis. Our objective is to uncover current research hotspots, delineate key themes, and identify future research trends. In doing so, we hope to provide direction for future studies and the development of innovative treatment methods. Methods: Relevant publications in this field were retrieved from the Web of Science Core Collection database. We used VOSviewer, CiteSpace, an online analysis platform and the R package "Bibliometrix" for bibliometric analysis. Results: A total of 529 publications (including 351 articles and 178 reviews) from 61 countries, 881 institutions, were included in this study. China leads in publication volume and boast the highest cumulative citation. Shanghai Jiao Tong University and Southern Medical University are the leading research institutions in this field. Nutrients contributed the largest number of articles, and J Bone Miner Res is the most co-cited journal. Of the 3,166 scholars who participated in the study, Ohlsson C had the largest number of articles. Li YJ is the most co-cited author. "Probiotics" and "inflammation" are the keywords in the research. Conclusion: This is the first bibliometric analysis of gut microbiota in osteoporosis. We explored current research status in recent years and identified frontiers and hot spots in this research field. We investigate the impact of gut microbiome dysregulation and its associated inflammation on OP progression, a topic that has garnered international research interest in recent years. Additionally, our study delves into the potential of fecal microbiota transplantation or specific dietary interventions as promising avenues for future research, which can provide reference for the researchers who focus on this research filed.

13.
Genome Biol ; 25(1): 162, 2024 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-38902825

RESUMEN

BACKGROUND: The functional coupling between alternative pre-mRNA splicing (AS) and the mRNA quality control mechanism called nonsense-mediated decay (NMD) can modulate transcript abundance. Previous studies have identified several examples of such a regulation in developing neurons. However, the systems-level effects of AS-NMD in this context are poorly understood. RESULTS: We developed an R package, factR2, which offers a comprehensive suite of AS-NMD analysis functions. Using this tool, we conducted a longitudinal analysis of gene expression in pluripotent stem cells undergoing induced neuronal differentiation. Our analysis uncovers hundreds of AS-NMD events with significant potential to regulate gene expression. Notably, this regulation is significantly overrepresented in specific functional groups of developmentally downregulated genes. Particularly strong association with gene downregulation is detected for alternative cassette exons stimulating NMD upon their inclusion into mature mRNA. By combining bioinformatic analyses with CRISPR/Cas9 genome editing and other experimental approaches we show that NMD-stimulating cassette exons regulated by the RNA-binding protein PTBP1 dampen the expression of their genes in developing neurons. We also provided evidence that the inclusion of NMD-stimulating cassette exons into mature mRNAs is temporally coordinated with NMD-independent gene repression mechanisms. CONCLUSIONS: Our study provides an accessible workflow for the discovery and prioritization of AS-NMD targets. It further argues that the AS-NMD pathway plays a widespread role in developing neurons by facilitating the downregulation of functionally related non-neuronal genes.


Asunto(s)
Empalme Alternativo , Regulación hacia Abajo , Neuronas , Degradación de ARNm Mediada por Codón sin Sentido , Proteína de Unión al Tracto de Polipirimidina , Animales , Ratones , Neuronas/metabolismo , Proteína de Unión al Tracto de Polipirimidina/metabolismo , Proteína de Unión al Tracto de Polipirimidina/genética , Exones , Ribonucleoproteínas Nucleares Heterogéneas/metabolismo , Ribonucleoproteínas Nucleares Heterogéneas/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Regulación del Desarrollo de la Expresión Génica , Diferenciación Celular/genética , Neurogénesis/genética
14.
Int J Mol Sci ; 25(12)2024 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-38928396

RESUMEN

Proteomics offers a robust method for quantifying proteins and elucidating their roles in cellular functions, surpassing the insights provided by transcriptomics. The Clinical Proteomic Tumor Analysis Consortium database, enriched with comprehensive cancer proteomics data including phosphorylation and ubiquitination profiles, alongside transcriptomics data from the Genomic Data Commons, allow for integrative molecular studies of cancer. The ProteoCancer Analysis Suite (PCAS), our newly developed R package and Shinyapp, leverages these resources to facilitate in-depth analyses of proteomics, phosphoproteomics, and transcriptomics, enhancing our understanding of the tumor microenvironment through features like immune infiltration and drug sensitivity analysis. This tool aids in identifying critical signaling pathways and therapeutic targets, particularly through its detailed phosphoproteomic analysis. To demonstrate the functionality of the PCAS, we conducted an analysis of GAPDH across multiple cancer types, revealing a significant upregulation of protein levels, which is consistent with its important biological and clinical significance in tumors, as indicated in our prior research. Further experiments were used to validate the findings performed using the tool. In conclusion, the PCAS is a powerful and valuable tool for conducting comprehensive proteomic analyses, significantly enhancing our ability to uncover oncogenic mechanisms and identify potential therapeutic targets in cancer research.


Asunto(s)
Neoplasias , Proteómica , Humanos , Proteómica/métodos , Neoplasias/metabolismo , Neoplasias/genética , Microambiente Tumoral/genética , Programas Informáticos , Biología Computacional/métodos , Proteoma/metabolismo
15.
Ecol Evol ; 14(5): e11292, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38725827

RESUMEN

Plant trait data are used to quantify how plants respond to environmental factors and can act as indicators of ecosystem function. Measured trait values are influenced by genetics, trade-offs, competition, environmental conditions, and phenology. These interacting effects on traits are poorly characterized across taxa, and for many traits, measurement protocols are not standardized. As a result, ancillary information about growth and measurement conditions can be highly variable, requiring a flexible data structure. In 2007, the TRY initiative was founded as an integrated database of plant trait data, including ancillary attributes relevant to understanding and interpreting the trait values. The TRY database now integrates around 700 original and collective datasets and has become a central resource of plant trait data. These data are provided in a generic long-table format, where a unique identifier links different trait records and ancillary data measured on the same entity. Due to the high number of trait records, plant taxa, and types of traits and ancillary data released from the TRY database, data preprocessing is necessary but not straightforward. Here, we present the 'rtry' R package, specifically designed to support plant trait data exploration and filtering. By integrating a subset of existing R functions essential for preprocessing, 'rtry' avoids the need for users to navigate the extensive R ecosystem and provides the functions under a consistent syntax. 'rtry' is therefore easy to use even for beginners in R. Notably, 'rtry' does not support data retrieval or analysis; rather, it focuses on the preprocessing tasks to optimize data quality. While 'rtry' primarily targets TRY data, its utility extends to data from other sources, such as the National Ecological Observatory Network (NEON). The 'rtry' package is available on the Comprehensive R Archive Network (CRAN; https://cran.r-project.org/package=rtry) and the GitHub Wiki (https://github.com/MPI-BGC-Functional-Biogeography/rtry/wiki) along with comprehensive documentation and vignettes describing detailed data preprocessing workflows.

16.
Comput Methods Programs Biomed ; 251: 108212, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38754327

RESUMEN

BACKGROUND AND OBJECTIVE: There is a rising interest in exploiting aggregate information from external medical studies to enhance the statistical analysis of a modestly sized internal dataset. Currently available software packages for analyzing survival data with a cure fraction ignore the potentially available auxiliary information. This paper aims at filling this gap by developing a new R package CureAuxSP that can include subgroup survival probabilities extracted outside into an interested internal survival dataset. METHODS: The newly developed R package CureAuxSP provides an efficient approach for information synthesis under the mixture cure models, including Cox proportional hazards mixture cure model and the accelerated failure time mixture cure model as special cases. It focuses on synthesizing subgroup survival probabilities at multiple time points and the underlying method development lies in the control variate technique. Evaluation of homogeneity assumption based on a test statistic can be automatically carried out by our package and if heterogeneity does exist, the original outputs can be further refined adaptively. RESULTS: The R package CureAuxSP provides a main function SMC.AxuSP() that helps us adaptively incorporate external subgroup survival probabilities into the analysis of an internal survival data. We also provide another function Print.SMC.AuxSP() for printing the results with a better presentation. Detailed usages are described, and implementations are illustrated with numerical examples, including a simulated dataset with a well-designed data generating process and a real breast cancer dataset. Substantial efficiency gain can be observed by our results. CONCLUSIONS: Our R package CureAuxSP can make the wide applications of utilizing auxiliary information possible. It is anticipated that the performance of mixture cure models can be improved for the survival data with a cure fraction, especially for those with small sample sizes.


Asunto(s)
Probabilidad , Modelos de Riesgos Proporcionales , Programas Informáticos , Humanos , Análisis de Supervivencia , Modelos Estadísticos , Simulación por Computador , Algoritmos , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/terapia
17.
Veg Hist Archaeobot ; 33(4): 475-487, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38803354

RESUMEN

The functional ecology of arable weeds provides a way of comparing present-day and past farming regimes. This paper presents the R package WeedEco, an open-source resource which allows users to compare their archaeobotanical dataset against three previously published arable weed models to understand fertility, disturbance or a combination of both. The package provides functions for data organisation, classification and visualisation, allowing users to enter raw archaeobotanical data, obtain trait values from the functional trait dataset, conduct discriminant analysis and plot the results against the relevant present-day model. Using data from the early medieval site of Stafford in the UK, the paper provides a detailed example of the use of the package, demonstrating its different functions, as well as how the results can be interpreted. Supplementary Information: The online version contains supplementary material available at 10.1007/s00334-023-00964-8.

18.
BMC Bioinformatics ; 25(1): 151, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38627634

RESUMEN

BACKGROUND: Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS: We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS: Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.


Asunto(s)
Genoma , Genómica , Animales , Humanos , Ratones , Cadenas de Markov , Composición de Base , Probabilidad , Algoritmos
19.
Cell Rep Methods ; 4(5): 100763, 2024 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-38670101

RESUMEN

Cellular barcoding is a lineage-tracing methodology that couples heritable synthetic barcodes to high-throughput sequencing, enabling the accurate tracing of cell lineages across a range of biological contexts. Recent studies have extended these methods by incorporating lineage information into single-cell or spatial transcriptomics readouts. Leveraging the rich biological information within these datasets requires dedicated computational tools for dataset pre-processing and analysis. Here, we present BARtab, a portable and scalable Nextflow pipeline, and bartools, an open-source R package, designed to provide an integrated end-to-end cellular barcoding analysis toolkit. BARtab and bartools contain methods to simplify the extraction, quality control, analysis, and visualization of lineage barcodes from population-level, single-cell, and spatial transcriptomics experiments. We showcase the utility of our integrated BARtab and bartools workflow via the analysis of exemplar bulk, single-cell, and spatial transcriptomics experiments containing cellular barcoding information.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Humanos , Programas Informáticos , Código de Barras del ADN Taxonómico/métodos , Genoma/genética , Linaje de la Célula/genética , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Animales
20.
medRxiv ; 2024 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-38496500

RESUMEN

IMPORTANCE: On December 10, 2021, the FDA published a Determination Letter, along with a Statistical Review and Evaluation Report, and concluded that under the non-informative prior, the local Bayesian optimal interval design (BOIN) design, in its revised form, can be designated fit-for-purpose for identifying the maximum tolerated dose (MTD) of a new drug, assuming that dose-toxicity relationship is monotonically increasing. Although setting the BOIN design parameter p.tox = 1.4 * target.DLT.rate is recommended in almost all BOIN methodology articles and is the default value in the R package BOIN, it's unclear if the choice of p.tox should only depend on the target DLT rate and whether certain range of p.tox could produce the same BOIN boundary table. DESIGN: In this simulation study, following parameters were varied one at a time, using R package BOIN, to explore each parameter's effect on the equivalence intervals of p.saf and p.tox: 1) target DLT rate, 2) n.earlystop, 3) cutoff.eli, 4) cohortsize, and 5) ncohort. And a simple 3+3 design was used as an example to explore equivalent sets of BOIN design parameters that can generate the same boundary table. RESULTS: When the early stopping parameter n.earlystop is relatively small or the cohortsize value is not optimized via simulation, it might be better to use p.tox < 1.4 * target.DLT.rate, or try out different cohort sizes, or increase n.earlystop, whichever is both feasible and provides better operating characteristics. This is because if the cohortsize was not optimized via simulation, even when n.earlystop = 12 and cohortsize > 3, the BOIN escalation/de-escalation rules generated using p.tox = 1.4 * target.DLT.rate could be exactly the same as those calculated using p.tox > 3 * target.DLT.rate, which might not be acceptable for some pediatric trials targeting 10% DLT rate.The traditional 3+3 design stops the dose finding process when 3 patients have been treated at the current dose level, 0 DLT has been observed, and the next higher dose has already been eliminated. If additional 3 patients were required to be treated at the current dose in the situation described above, the decision rules of this commonly used 3+3 design could be generated using BOIN design with target DLT rates ranging from 18% to 29%, p.saf ranging from 8% to 26%, and different p.tox values ranging from 39% to 99%. To generate this commonly used 3+3 design table, BOIN parameters also need to satisfy a set of conditions.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA