Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
1.
BMC Public Health ; 23(1): 184, 2023 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-36707789

RESUMO

BACKGROUND: Local governments and other public health entities often need population health measures at the county or subcounty level for activities such as resource allocation and targeting public health interventions, among others. Information collected via national surveys alone cannot fill these needs. We propose a novel, two-step method for rescaling health survey data and creating small area estimates (SAEs) of smoking rates using a Behavioral Risk Factor Surveillance System survey administered in 2015 to participants living in Allegheny County, Pennsylvania, USA. METHODS: The first step consisted of a spatial microsimulation to rescale location of survey respondents from zip codes to tracts based on census population distributions by age, sex, race, and education. The rescaling allowed us, in the second step, to utilize available census tract-specific ancillary data on social vulnerability for small area estimation of local health risk using an area-level version of a logistic linear mixed model. To demonstrate this new two-step algorithm, we estimated the ever-smoking rate for the census tracts of Allegheny County. RESULTS: The ever-smoking rate was above 70% for two census tracts to the southeast of the city of Pittsburgh. Several tracts in the southern and eastern sections of Pittsburgh also had relatively high (> 65%) ever-smoking rates. CONCLUSIONS: These SAEs may be used in local public health efforts to target interventions and educational resources aimed at reducing cigarette smoking. Further, our new two-step methodology may be extended to small area estimation for other locations and health outcomes.


Assuntos
Saúde Pública , Vulnerabilidade Social , Humanos , Inquéritos e Questionários , Pennsylvania/epidemiologia
2.
Entropy (Basel) ; 23(6)2021 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-34072055

RESUMO

In disease modeling, a key statistical problem is the estimation of lower and upper tail probabilities of health events from given data sets of small size and limited range. Assuming such constraints, we describe a computational framework for the systematic fusion of observations from multiple sources to compute tail probabilities that could not be obtained otherwise due to a lack of lower or upper tail data. The estimation of multivariate lower and upper tail probabilities from a given small reference data set that lacks complete information about such tail data is addressed in terms of pertussis case count data. Fusion of data from multiple sources in conjunction with the density ratio model is used to give probability estimates that are non-obtainable from the empirical distribution. Based on a density ratio model with variable tilts, we first present a univariate fit and, subsequently, improve it with a multivariate extension. In the multivariate analysis, we selected the best model in terms of the Akaike Information Criterion (AIC). Regional prediction, in Washington state, of the number of pertussis cases is approached by providing joint probabilities using fused data from several relatively small samples following the selected density ratio model. The model is validated by a graphical goodness-of-fit plot comparing the estimated reference distribution obtained from the fused data with that of the empirical distribution obtained from the reference sample only.

3.
Am J Physiol Lung Cell Mol Physiol ; 312(1): L79-L88, 2017 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-27836901

RESUMO

In many mammals, including humans, removal of one lung (pneumonectomy) results in the compensatory growth of the remaining lung. Compensatory growth involves not only an increase in lung size, but also an increase in the number of alveoli in the peripheral lung; however, the process of compensatory neoalveolarization remains poorly understood. Here, we show that the expression of α-smooth muscle actin (SMA)-a cytoplasmic protein characteristic of myofibroblasts-is induced in the pleura following pneumonectomy. SMA induction appears to be dependent on pleural deformation (stretch) as induction is prevented by plombage or phrenic nerve transection (P < 0.001). Within 3 days of pneumonectomy, the frequency of SMA+ cells in subpleural alveolar ducts was significantly increased (P < 0.01). To determine the functional activity of these SMA+ cells, we isolated regenerating alveolar ducts by laser microdissection and analyzed individual cells using microfluidic single-cell quantitative PCR. Single cells expressing the SMA (Acta2) gene demonstrated significantly greater transcriptional activity than endothelial cells or other discrete cell populations in the alveolar duct (P < 0.05). The transcriptional activity of the Acta2+ cells, including expression of TGF signaling as well as repair-related genes, suggests that these myofibroblast-like cells contribute to compensatory lung growth.


Assuntos
Pulmão/crescimento & desenvolvimento , Miofibroblastos/metabolismo , Miofibroblastos/patologia , Estresse Mecânico , Actinas/metabolismo , Animais , Separação Celular , Regulação da Expressão Gênica no Desenvolvimento , Citometria por Imagem , Pulmão/metabolismo , Pulmão/cirurgia , Masculino , Camundongos Endogâmicos C57BL , Pneumonectomia , Reação em Cadeia da Polimerase , Análise de Célula Única , Transcrição Gênica
4.
Cytometry A ; 89(1): 30-43, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26492316

RESUMO

We present an algorithm for modeling flow cytometry data in the presence of large inter-sample variation. Large-scale cytometry datasets often exhibit some within-class variation due to technical effects such as instrumental differences and variations in data acquisition, as well as subtle biological heterogeneity within the class of samples. Failure to account for such variations in the model may lead to inaccurate matching of populations across a batch of samples and poor performance in classification of unlabeled samples. In this paper, we describe the Joint Clustering and Matching (JCM) procedure for simultaneous segmentation and alignment of cell populations across multiple samples. Under the JCM framework, a multivariate mixture distribution is used to model the distribution of the expressions of a fixed set of markers for each cell in a sample such that the components in the mixture model may correspond to the various populations of cells, which have similar expressions of markers (that is, clusters), in the composition of the sample. For each class of samples, an overall class template is formed by the adoption of random-effects terms to model the inter-sample variation within a class. The construction of a parametric template for each class allows for direct quantification of the differences between the template and each sample, and also between each pair of samples, both within or between classes. The classification of a new unclassified sample is then undertaken by assigning the unclassified sample to the class that minimizes the distance between its fitted mixture density and each class density as provided by the class templates. For illustration, we use a symmetric form of the Kullback-Leibler divergence as a distance measure between two densities, but other distance measures can also be applied. We show and demonstrate on four real datasets how the JCM procedure can be used to carry out the tasks of automated clustering and alignment of cell populations, and supervised classification of samples.


Assuntos
Biomarcadores/sangue , Biologia Computacional/métodos , Processamento Eletrônico de Dados/métodos , Citometria de Fluxo/métodos , Proteínas de Membrana/análise , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Análise por Conglomerados , Interpretação Estatística de Dados , Humanos , Leucemia Mieloide Aguda/diagnóstico , Linfoma Folicular/diagnóstico , Modelos Teóricos , Febre do Nilo Ocidental/diagnóstico
5.
Comput Stat Data Anal ; 104: 79-90, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28496285

RESUMO

The statistical matching problem involves the integration of multiple datasets where some variables are not observed jointly. This missing data pattern leaves most statistical models unidentifiable. Statistical inference is still possible when operating under the framework of partially identified models, where the goal is to bound the parameters rather than to estimate them precisely. In many matching problems, developing feasible bounds on the parameters is equivalent to finding the set of positive-definite completions of a partially specified covariance matrix. Existing methods for characterising the set of possible completions do not extend to high-dimensional problems. A Gibbs sampler to draw from the set of possible completions is proposed. The variation in the observed samples gives an estimate of the feasible region of the parameters. The Gibbs sampler extends easily to high-dimensional statistical matching problems.

6.
BMC Bioinformatics ; 15: 260, 2014 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-25086605

RESUMO

BACKGROUND: Gene set analysis (GSA) methods test the association of sets of genes with phenotypes in gene expression microarray studies. While GSA methods on a single binary or categorical phenotype abounds, little attention has been paid to the case of a continuous phenotype, and there is no method to accommodate correlated multiple continuous phenotypes. RESULT: We propose here an extension of the linear combination test (LCT) to its new version for multiple continuous phenotypes, incorporating correlations among gene expressions of functionally related gene sets, as well as correlations among multiple phenotypes. Further, we extend our new method to its nonlinear version, referred as nonlinear combination test (NLCT), to test potential nonlinear association of gene sets with multiple phenotypes. Simulation study and a real microarray example demonstrate the practical aspects of the proposed methods. CONCLUSION: The proposed approaches are effective in controlling type I errors and powerful in testing associations between gene-sets and multiple continuous phenotypes. They are both computationally effective. Naively (univariately) analyzing a group of multiple correlated phenotypes could be dangerous. R-codes to perform LCT and NLCT for multiple continuous phenotypes are available at http://www.ualberta.ca/~yyasui/homepage.html.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Interpretação Estatística de Dados , Humanos
7.
Bioinformatics ; 29(2): 182-8, 2013 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-23172863

RESUMO

MOTIVATION: There is a substantial body of works in the biology literature that seeks to characterize the cyclic behavior of genes during cell division. Gene expression microarrays made it possible to measure the expression profiles of thousands of genes simultaneously in time-course experiments to assess changes in the expression levels of genes over time. In this context, the commonly used procedures for testing include the permutation test by de Lichtenberg et al. and the Fisher's G-test, both of which are designed to evaluate periodicity against noise. However, it is possible that a gene of interest may have expression that is neither cyclic nor just noise. Thus, there is a need for a new test for periodicity that can identify cyclic patterns against not only noise but also other non-cyclic patterns such as linear, quadratic or higher order polynomial patterns. RESULTS: To address this weakness, we have introduced an empirical Bayes approach to test for periodicity and compare its performance in terms of sensitivity and specificity with that of the permutation test and Fisher's G-test through extensive simulations and by application to a set of time-course experiments on the Schizosaccharomyces pombe cell-cycle gene expression. We use 'conserved' and 'cycling' genes by Lu et al. to assess the sensitivity and CESR genes by Chenet al. to assess the specificity of our new empirical Bayes method. AVAILABILITY AND IMPLEMENTATION: The SAS Macro for our empirical Bayes test for periodicity is included in the supplementary materials along with a sample run of the MACRO program. CONTACT: mkocak1@uthsc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Periodicidade , Algoritmos , Teorema de Bayes , Ciclo Celular/genética , Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Schizosaccharomyces/genética , Sensibilidade e Especificidade
8.
Artigo em Inglês | MEDLINE | ID: mdl-38929017

RESUMO

BACKGROUND: Social and Environmental Determinants of Health (SEDH) provide us with a conceptual framework to gain insights into possible associations among different human behaviors and the corresponding health outcomes that take place often in and around complex built environments. Developing better built environments requires an understanding of those aspects of a community that are most likely to have a measurable impact on the target SEDH. Yet data on local characteristics at suitable spatial scales are often unavailable. We aim to address this issue by application of different data disaggregation methods. METHODS: We applied different approaches to data disaggregation to obtain small area estimates of key behavioral risk factors, as well as geospatial measures of green space access and walkability for each zip code of Allegheny County in southwestern Pennsylvania. RESULTS: Tables and maps of local characteristics revealed their overall spatial distribution along with disparities therein across the county. While the top ranked zip codes by behavioral estimates generally have higher than the county's median individual income, this does not lead them to have higher than its median green space access or walkability. CONCLUSION: We demonstrated the utility of data disaggregation for addressing complex questions involving community-specific behavioral attributes and built environments with precision and rigor, which is especially useful for a diverse population. Thus, different types of data, when comparable at a common local scale, can provide key integrative insights for researchers and policymakers.


Assuntos
Características de Residência , Caminhada , Humanos , Caminhada/estatística & dados numéricos , Pennsylvania , Fatores de Risco , Ambiente Construído/estatística & dados numéricos , Planejamento Ambiental , Parques Recreativos/estatística & dados numéricos , Comportamentos Relacionados com a Saúde
9.
BMC Bioinformatics ; 14: 212, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23815123

RESUMO

BACKGROUND: Gene set analysis (GSA) methods test the association of sets of genes with a phenotype in gene expression microarray studies. Many GSA methods have been proposed, especially methods for use with a binary phenotype. Equally, if not more importantly however, is the ability to test the enrichment of a gene signature or pathway against the continuous phenotypes which are routinely and commonly observed in, for example, clinicopathological measurements. It is not always easy or meaningful to dichotomize continuous phenotypes into two classes, and attempting to do this may lead to the inaccurate classification of samples, which would affect the downstream enrichment analysis. In the present study, we have build on recent efforts to incorporate correlation structure within gene sets and pathways into the GSA test statistic. To address the issue of continuous phenotypes directly without the need for artificial discrete classification and thus increase the power of the test while ensuring computational efficiency and rigor, new GSA methods that can incorporate a covariance matrix estimator for a continuous phenotype may present an effective approach. RESULTS: We have designed a new method by extending the GSA approach called Linear Combination Test (LCT) from a binary to a continuous phenotype. Simulation studies and a real microarray dataset were used to compare the proposed LCT for a continuous phenotype, a modification of LCT (referred to as LCT2), and two publicly available GSA methods for continuous phenotypes. CONCLUSIONS: We found that the LCT methods performed better than the other two GSA methods; however, this finding should be understood in the context of our specific simulation studies and the real microarray dataset that were used to compare the methods. Free R-codes to perform LCT for binary and continuous phenotypes are available at http://www.ualberta.ca/~yyasui/homepage.html. The R-code to perform LCT for a continuous phenotype is available as Additional file 1.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fenótipo , Humanos , Leptina/genética , Leptina/metabolismo
10.
Proc Natl Acad Sci U S A ; 107(24): 11002-7, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20534477

RESUMO

Recent work has shown that ablation of p110beta, but not p110alpha, markedly impairs tumorigenesis driven by loss of phosphatase and tensin homolog (PTEN) in the mouse prostate. Other laboratories have reported complementary data in human prostate tumor lines, suggesting that p110beta activation is necessary for tumorigenesis driven by PTEN loss. Given the multiple functions of PTEN, we wondered if p110beta activation also is sufficient for tumorigenesis. Here, we report that transgenic expression of a constitutively activated p110beta allele in the prostate drives prostate intraepithelial neoplasia formation. The resulting lesions are similar to, but are clearly distinct from, the ones arising from PTEN loss or Akt activation. Array analyses of transcription in multiple murine prostate tumor models featuring PI3K/AKT pathway activation allowed construction of a pathway signature that may be useful in predicting the prognosis of human prostate tumors.


Assuntos
Fosfatidilinositol 3-Quinases/metabolismo , Neoplasia Prostática Intraepitelial/enzimologia , Neoplasias da Próstata/enzimologia , Fatores Etários , Animais , Classe I de Fosfatidilinositol 3-Quinases , Modelos Animais de Doenças , Ativação Enzimática , Perfilação da Expressão Gênica , Genes myc , Humanos , Masculino , Metaplasia , Camundongos , Camundongos Knockout , Camundongos Transgênicos , NF-kappa B/genética , PTEN Fosfo-Hidrolase/deficiência , PTEN Fosfo-Hidrolase/genética , Fosfatidilinositol 3-Quinases/genética , Próstata/enzimologia , Neoplasia Prostática Intraepitelial/etiologia , Neoplasia Prostática Intraepitelial/genética , Neoplasia Prostática Intraepitelial/patologia , Neoplasias da Próstata/etiologia , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Proteínas Proto-Oncogênicas c-akt/genética , Proteínas Recombinantes de Fusão/genética , Proteínas Recombinantes de Fusão/metabolismo , Transdução de Sinais , Especificidade da Espécie
11.
Proc Natl Acad Sci U S A ; 107(18): 8352-6, 2010 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-20404174

RESUMO

Predicting drug response in cancer patients remains a major challenge in the clinic. We have perfected an ex vivo, reproducible, rapid and personalized culture method to investigate antitumoral pharmacological properties that preserves the original cancer microenvironment. Response to signal transduction inhibitors in cancer is determined not only by properties of the drug target but also by mutations in other signaling molecules and the tumor microenvironment. As a proof of concept, we, therefore, focused on the PI3K/Akt signaling pathway, because it plays a prominent role in cancer and its activity is affected by epithelial-stromal interactions. Our results show that this culture model preserves tissue 3D architecture, cell viability, pathway activity, and global gene-expression profiles up to 5 days ex vivo. In addition, we show pathway modulation in tumor cells resulting from pharmacologic intervention in ex vivo culture. This technology may have a significant impact on patient selection for clinical trials and in predicting response to small-molecule inhibitor therapy.


Assuntos
Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Neoplasias/genética , Inibidores de Fosfoinositídeo-3 Quinase , Inibidores de Proteínas Quinases/farmacologia , Transdução de Sinais/efeitos dos fármacos , Biópsia , Forma Celular , Sobrevivência Celular , Perfilação da Expressão Gênica , Humanos , Neoplasias/metabolismo , Neoplasias/patologia , Fosfatidilinositol 3-Quinases/metabolismo , Técnicas de Cultura de Tecidos
12.
Front Cell Dev Biol ; 11: 1065586, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36998245

RESUMO

Background: The impact of gene-sets on a spatial phenotype is not necessarily uniform across different locations of cancer tissue. This study introduces a computational platform, GWLCT, for combining gene set analysis with spatial data modeling to provide a new statistical test for location-specific association of phenotypes and molecular pathways in spatial single-cell RNA-seq data collected from an input tumor sample. Methods: The main advantage of GWLCT consists of an analysis beyond global significance, allowing the association between the gene-set and the phenotype to vary across the tumor space. At each location, the most significant linear combination is found using a geographically weighted shrunken covariance matrix and kernel function. Whether a fixed or adaptive bandwidth is determined based on a cross-validation cross procedure. Our proposed method is compared to the global version of linear combination test (LCT), bulk and random-forest based gene-set enrichment analyses using data created by the Visium Spatial Gene Expression technique on an invasive breast cancer tissue sample, as well as 144 different simulation scenarios. Results: In an illustrative example, the new geographically weighted linear combination test, GWLCT, identifies the cancer hallmark gene-sets that are significantly associated at each location with the five spatially continuous phenotypic contexts in the tumors defined by different well-known markers of cancer-associated fibroblasts. Scan statistics revealed clustering in the number of significant gene-sets. A spatial heatmap of combined significance over all selected gene-sets is also produced. Extensive simulation studies demonstrate that our proposed approach outperforms other methods in the considered scenarios, especially when the spatial association increases. Conclusion: Our proposed approach considers the spatial covariance of gene expression to detect the most significant gene-sets affecting a continuous phenotype. It reveals spatially detailed information in tissue space and can thus play a key role in understanding the contextual heterogeneity of cancer cells.

13.
PLoS One ; 18(1): e0279414, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36602961

RESUMO

OBJECTIVE: Food security is an important policy issue in India. As India recently ranked 107th out of 121 countries in the 2022 Global Hunger Index, there is an urgent need to dissect, and gain insights into, such a major decline at the national level. However, the existing surveys, due to small sample sizes, cannot be used directly to produce reliable estimates at local administrative levels such as districts. DESIGN: The latest round of available data from the Household Consumer Expenditure Survey (HCES 2011-12) done by the National Sample Survey Office of India used stratified multi-stage random sampling with districts as strata, villages as first stage and households as second stage units. SETTING: Our Small Area Estimation approach estimated food insecurity prevalence, gap, and severity of each rural district of the Eastern Indo-Gangetic Plain (EIGP) region by modeling the HCES data, guided by local covariates from the 2011 Indian Population Census. PARTICIPANTS: In HCES, 5915 (34429), 3310 (17534) and 3566 (15223) households (persons) were surveyed from the 71, 38 and 18 districts of the EIGP states of Uttar Pradesh, Bihar and West Bengal respectively. RESULTS: We estimated the district-specific food insecurity indicators, and mapped their local disparities over the EIGP region. By comparing food insecurity with indicators of climate vulnerability, poverty and crop diversity, we shortlisted the vulnerable districts in EIGP. CONCLUSIONS: Our district-level estimates and maps can be effective for informed policy-making to build local resiliency and address systemic vulnerabilities where they matter most in the post-pandemic era. ADVANCES: Our study computed, for the Indian states in the EIGP region, the first area-level small area estimates of food insecurity as well as poverty over the past decade, and generated a ranked list of districts upon combining these data with measures of crop diversity and climatic vulnerability.


Assuntos
Insegurança Alimentar , Abastecimento de Alimentos , Humanos , Pobreza , Características da Família , Inquéritos e Questionários
14.
PLoS One ; 18(10): e0292915, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37851657

RESUMO

We generated Optical Coherence Tomography (OCT) data of much higher resolution than usual on retinal nerve fiber layer (RNFL) thickness of a given eye. These consist of measurements made at hundreds of angular-points defined on a circular coordinate system. Traditional analysis of OCT RNFL data does not utilize insightful characteristics such as its circularity and granularity for common downstream applications. To address this, we present a new circular statistical framework that defines an Angular Decay function and thereby provides a directionally precise representation of an eye with attention to patterns of focused RNFL loss. By applying to a clinical cohort of Asian Indian eyes, the generated circular data were modeled with a finite mixture of von Mises distributions, which led to an unsupervised identification in different age-groups of recurrent clusters of glaucomatous eyes with distinct directional signatures of RNFL decay. New indices of global and local RNFL loss were computed for comparing the structural differences between these glaucoma clusters across the age-groups and improving classification. Further, we built a catalog of directionally precise statistical distributions of RNFL thickness for the said population of normal eyes as stratified by their age and optic disc size.


Assuntos
Glaucoma , Tomografia de Coerência Óptica , Humanos , Tomografia de Coerência Óptica/métodos , Glaucoma/diagnóstico por imagem , Retina , Fibras Nervosas , Pressão Intraocular
15.
BMC Bioinformatics ; 13 Suppl 2: S10, 2012 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-22536861

RESUMO

BACKGROUND: When flow cytometric data on mixtures of cell populations are collected from samples under different experimental conditions, computational methods are needed (a) to classify the samples into similar groups, and (b) to characterize the changes within the corresponding populations due to the different conditions. Manual inspection has been used in the past to study such changes, but high-dimensional experiments necessitate developing new computational approaches to this problem. A robust solution to this problem is to construct distinct templates to summarize all samples from a class, and then to compare these templates to study the changes across classes or conditions. RESULTS: We designed a hierarchical algorithm, flowMatch, to first match the corresponding clusters across samples for producing robust meta-clusters, and to then construct a high-dimensional template as a collection of meta-clusters for each class of samples. We applied the algorithm on flow cytometry data obtained from human blood cells before and after stimulation with anti-CD3 monoclonal antibody, which is reported to change phosphorylation responses of memory and naive T cells. The flowMatch algorithm is able to construct representative templates from the samples before and after stimulation, and to match corresponding meta-clusters across templates. The templates of the pre-stimulation and post-stimulation data corresponding to memory and naive T cell populations clearly show, at the level of the meta-clusters, the overall phosphorylation shift due to the stimulation. CONCLUSIONS: We concisely represent each class of samples by a template consisting of a collection of meta-clusters (representative abstract populations). Using flowMatch, the meta-clusters across samples can be matched to assess overall differences among the samples of various phenotypes or time-points.


Assuntos
Algoritmos , Citometria de Fluxo , Receptores de Antígenos de Linfócitos T/metabolismo , Linfócitos T/imunologia , Humanos , Fosforilação
16.
BMC Bioinformatics ; 13 Suppl 5: S5, 2012 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-22537009

RESUMO

BACKGROUND: Gradual or sudden transitions among different states as exhibited by cell populations in a biological sample under particular conditions or stimuli can be detected and profiled by flow cytometric time course data. Often such temporal profiles contain features due to transient states that present unique modeling challenges. These could range from asymmetric non-Gaussian distributions to outliers and tail subpopulations, which need to be modeled with precision and rigor. RESULTS: To ensure precision and rigor, we propose a parametric modeling framework StateProfiler based on finite mixtures of skew t-Normal distributions that are robust against non-Gaussian features caused by asymmetry and outliers in data. Further, we present in StateProfiler a new greedy EM algorithm for fast and optimal model selection. The parsimonious approach of our greedy algorithm allows us to detect the genuine dynamic variation in the key features as and when they appear in time course data. We also present a procedure to construct a well-fitted profile by merging any redundant model components in a way that minimizes change in entropy of the resulting model. This allows precise profiling of unusually shaped distributions and less well-separated features that may appear due to cellular heterogeneity even within clonal populations. CONCLUSIONS: By modeling flow cytometric data measured over time course and marker space with StateProfiler, specific parametric characteristics of cellular states can be identified. The parameters are then tested statistically for learning global and local patterns of spatio-temporal change. We applied StateProfiler to identify the temporal features of yeast cell cycle progression based on knockout of S-phase triggering cyclins Clb5 and Clb6, and then compared the S-phase delay phenotypes due to differential regulation of the two cyclins. We also used StateProfiler to construct the temporal profile of clonal divergence underlying lineage selection in mammalian hematopoietic progenitor cells.


Assuntos
Ciclo Celular , Citometria de Fluxo , Saccharomyces cerevisiae/citologia , Algoritmos , Ciclina B/metabolismo , Distribuição Normal , Fase S , Proteínas de Saccharomyces cerevisiae/metabolismo
17.
Bioinformatics ; 27(19): 2746-53, 2011 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-21846734

RESUMO

MOTIVATION: Monoclonal antibodies (mAbs) are among the most powerful and important tools in biology and medicine. MAb development is of great significance to many research and clinical applications. Therefore, objective mAb classification is essential for categorizing and comparing mAb panels based on their reactivity patterns in different cellular species. However, typical flow cytometric mAb profiles present unique modeling challenges with their non-Gaussian features and intersample variations. It makes accurate mAb classification difficult to do with the currently used kernel-based or hierarchical clustering techniques. RESULTS: To address these challenges, in the present study we developed a formal two-step framework called mAbprofiler for systematic, parametric characterization of mAb profiles. Further, we measured the reactivity of hundreds of new antibodies in diverse tissues using flow cytometry, which we successfully classified using mAbprofiler. First, mAbprofiler fits a mAb's flow cytometric histogram with a finite mixture model of skew t distributions that is robust against non-Gaussian features, and constructs a precise, smooth and mathematically rigorous profile. Then it performs novel curve clustering of the fitted mAb profiles using a skew t mixture of non-linear regression model that can handle intersample variation. Thus, mAbprofiler provides a new framework for identifying robust mAb classes, all well defined by distinct parametric templates, which can be used for classifying new mAb samples. We validated our classification results both computationally and empirically using mAb profiles of known classification. AVAILABILITY AND IMPLEMENTATION: A demonstration code in R is available at the journal website. The R code implementing the full framework is available from the author website - http://amath.nchu.edu.tw/www/teacher/tilin/software CONTACT: saumyadipta_pyne@dfci.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Anticorpos Monoclonais/classificação , Anticorpos Monoclonais/imunologia , Animais , Reações Antígeno-Anticorpo/imunologia , Células/imunologia , Análise por Conglomerados , Citometria de Fluxo , Camundongos , Modelos Biológicos , Ovinos , Software
18.
Proc Natl Acad Sci U S A ; 106(21): 8519-24, 2009 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-19443687

RESUMO

Flow cytometric analysis allows rapid single cell interrogation of surface and intracellular determinants by measuring fluorescence intensity of fluorophore-conjugated reagents. The availability of new platforms, allowing detection of increasing numbers of cell surface markers, has challenged the traditional technique of identifying cell populations by manual gating and resulted in a growing need for the development of automated, high-dimensional analytical methods. We present a direct multivariate finite mixture modeling approach, using skew and heavy-tailed distributions, to address the complexities of flow cytometric analysis and to deal with high-dimensional cytometric data without the need for projection or transformation. We demonstrate its ability to detect rare populations, to model robustly in the presence of outliers and skew, and to perform the critical task of matching cell populations across samples that enables downstream analysis. This advance will facilitate the application of flow cytometry to new, complex biological and clinical problems.


Assuntos
Citometria de Fluxo/métodos , Biomarcadores , Linhagem Celular , Membrana Celular/metabolismo , Imunidade Inata/imunologia , Memória Imunológica/imunologia , Modelos Biológicos , Fenótipo , Fosforilação , Estatística como Assunto , Linfócitos T/citologia , Linfócitos T/imunologia
19.
Comput Biol Med ; 151(Pt A): 106175, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36306577

RESUMO

OBJECTIVES: To identify patterns of association and transition in polysubstance use based on National Survey of Drug Use and Health (NSDUH) in the United States. METHODS: We developed a new computational platform for PolySubstance Use data Mining for Associations and Transitions (PSUMAnT). It is based on the computation of weighted support, a measure of popularity, for the use of every combination of one or more substances, termed as a drugset, over a period of 5 decades (1965-2014) based on NSDUH data. It uses an efficient bitstring representation with exact and approximate string matching capabilities to search for patterns of association between drugsets and demographics of user groups at different time-intervals. Moreover, it introduces a quantitative definition of a rule of transition between pairs of substances used within a given time-interval, and provides a function for mining them. RESULTS: We identified the frequent drugsets from individual substance use database, and determined their representation among different demographic groups at different intervals. An interesting pattern of use of pain relievers and tranquilizers was detected for the age-group of 26-34 years. In addition, transition rules for heroin use in the last decade (2004-2015) of the given data were mined. CONCLUSIONS: Computation of weighted supports over time for every possible combination of substances in the survey, and their association with specific user groups, allows PSUMAnT to generate and test novel, interesting hypotheses in polysubstance use. PSUMAnT can be used for mining combinations of substances used among diverse demographic groups including those that have received less attention in this problem.


Assuntos
Mineração de Dados , Estados Unidos/epidemiologia , Bases de Dados Factuais
20.
Cancers (Basel) ; 14(21)2022 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-36358654

RESUMO

Intratumor heterogeneity (ITH) is associated with therapeutic resistance and poor prognosis in cancer patients, and attributed to genetic, epigenetic, and microenvironmental factors. We developed a new computational platform, GATHER, for geostatistical modeling of single cell RNA-seq data to synthesize high-resolution and continuous gene expression landscapes of a given tumor sample. Such landscapes allow GATHER to map the enriched regions of pathways of interest in the tumor space and identify genes that have spatial differential expressions at locations representing specific phenotypic contexts using measures based on optimal transport. GATHER provides new applications of spatial entropy measures for quantification and objective characterization of ITH. It includes new tools for insightful visualization of spatial transcriptomic phenomena. We illustrate the capabilities of GATHER using real data from breast cancer tumor to study hallmarks of cancer in the phenotypic contexts defined by cancer associated fibroblasts.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa