Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Nat Commun ; 14(1): 4059, 2023 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-37429865

RESUMEN

Feature selection to identify spatially variable genes or other biologically informative genes is a key step during analyses of spatially-resolved transcriptomics data. Here, we propose nnSVG, a scalable approach to identify spatially variable genes based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations. We demonstrate the performance of our method using experimental data from several technological platforms and simulations. A software implementation is available at https://bioconductor.org/packages/nnSVG .


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , Análisis por Conglomerados , Distribución Normal
2.
Artículo en Inglés | MEDLINE | ID: mdl-37077317

RESUMEN

With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y 1, y 2, …, yn ) with covariates (x1, ..., x n ) can be modeled non-parametrically as yi = m(x i ) + ϵi , where m(x i ) is mean component and ∈i accounts for the dependency in data. We assume that dependence is captured through a covariance function of the correlated stochastic process ∈i (second order dependence). The correlation is typically a function of "spatial distance" or "time-lag" between two observations. Unlike linear regression, non-linear Machine Learning (ML) methods for estimating the regression function m can capture complex interactions among the variables. However, they often fail to account for the dependence structure, resulting in sub-optimal estimation. On the other hand, specialized software for spatial/temporal data properly models data correlation but lacks flexibility in modeling the mean function m by only focusing on linear models. RandomForestsGLS bridges the gap through a novel rendition of Random Forests (RF) - namely, RF-GLS - by explicitly modeling the spatial/serial data correlation in the RF fitting procedure to substantially improve the estimation of the mean function. Additionally, RandomForestsGLS leverages kriging to perform predictions at new locations for geo-spatial data.

3.
J Data Sci ; 20(4): 533-544, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-37786782

RESUMEN

Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.

4.
Atmos Environ (1994) ; 2422020 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-32922146

RESUMEN

Low-cost air pollution monitors are increasingly being deployed to enrich knowledge about ambient air-pollution at high spatial and temporal resolutions. However, unlike regulatory-grade (FEM or FRM) instruments, universal quality standards for low-cost sensors are yet to be established and their data quality varies widely. This mandates thorough evaluation and calibration before any responsible use of such data. This study presents evaluation and field-calibration of the PM2.5 data from a network of low-cost monitors currently operating in Baltimore, MD, which has only one regulatory PM2.5 monitoring site within city limits. Co-location analysis at this regulatory site in Oldtown, Baltimore revealed high variability and significant overestimation of PM2.5 levels by the raw data from these monitors. Universal laboratory corrections reduced the bias in the data, but only partially mitigated the high variability. Eight months of field co-location data at Oldtown were used to develop a gain-offset calibration model, recast as a multiple linear regression. The statistical model offered substantial improvement in prediction quality over the raw or lab-corrected data. The results were robust to the choice of the low-cost monitor used for field-calibration, as well as to different seasonal choices of training period. The raw, lab-corrected and statistically-calibrated data were evaluated for a period of two months following the training period. The statistical model had the highest agreement with the reference data, producing a 24-hour average root-mean-square-error (RMSE) of around 2 µg m -3. To assess transferability of the calibration equations to other monitors in the network, a cross-site evaluation was conducted at a second co-location site in suburban Essex, MD. The statistically calibrated data once again produced the lowest RMSE. The calibrated PM2.5 readings from the monitors in the low-cost network provided insights into the intra-urban spatiotemporal variations of PM2.5 in Baltimore.

5.
IEEE Trans Cybern ; 49(12): 4229-4242, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30137019

RESUMEN

We present a novel alternative convergence theory of the fuzzy C -means (FCM) clustering algorithm with a super-class of the so-called "distance like functions" which emerged from the earlier attempts of unifying the theories of center-based clustering methods. This super-class does not assume the existence of double derivative of the distance measure with respect to the coordinate of the cluster representative (first coordinate in this formulation). The convergence result does not require the separability of the distance measures. Moreover, it provides us with a stronger convergence property comparable (same to be precise, but in terms of the generalized distance measure) to that of the classical FCM with squared Euclidean distance. The crux of the convergence analysis lies in the development of a fundamentally novel mathematical proof of the continuity of the clustering operator even in absence of the closed form upgrading rule, without necessitating the separability and double differentiability of the distance function and still providing us with a convergence result comparable to that of the classical FCM. The implication of our novel proof technique goes way beyond the realm of FCM and provides a general setup for convergence analysis of the similar iterative algorithms.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA