Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
Add more filters











Publication year range
1.
Article in English | MEDLINE | ID: mdl-39190515

ABSTRACT

DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-square approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures (e.g., manifolds) that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal components analysis (PCA). Furthermore, in contrast to the SVD and PCA, whose performance deteriorates when the tensor's entries deviate from additive white Gaussian noise, we demonstrate that the performance of DeepTensor is robust to a wide range of distributions. We validate that DeepTensor is a robust and computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. In particular, DeepTensor offers a 6dB signal-to-noise ratio improvement over standard denoising methods for signal corrupted by Poisson noise and learns to decompose 3D tensors 60 times faster than a single DN equipped with 3D convolutions.

2.
IEEE Trans Neural Netw Learn Syst ; 35(4): 5014-5026, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37104113

ABSTRACT

The first step toward investigating the effectiveness of a treatment via a randomized trial is to split the population into control and treatment groups then compare the average response of the treatment group receiving the treatment to the control group receiving the placebo. To ensure that the difference between the two groups is caused only by the treatment, it is crucial that the control and the treatment groups have similar statistics. Indeed, the validity and reliability of a trial are determined by the similarity of two groups' statistics. Covariate balancing methods increase the similarity between the distributions of the two groups' covariates. However, often in practice, there are not enough samples to accurately estimate the groups' covariate distributions. In this article, we empirically show that covariate balancing with the standardized means difference (SMD) covariate balancing measure, as well as Pocock and Simon's sequential treatment assignment method, are susceptible to worst case treatment assignments. Worst case treatment assignments are those admitted by the covariate balance measure, but result in highest possible ATE estimation errors. We developed an adversarial attack to find adversarial treatment assignment for any given trial. Then, we provide an index to measure how close the given trial is to the worst case. To this end, we provide an optimization-based algorithm, namely adversarial treatment assignment in treatment effect trials (ATASTREET), to find the adversarial treatment assignments.


Subject(s)
Neural Networks, Computer , Research Design , Reproducibility of Results , Randomized Controlled Trials as Topic , Computer Simulation
3.
Anal Chem ; 95(48): 17458-17466, 2023 12 05.
Article in English | MEDLINE | ID: mdl-37971927

ABSTRACT

Microfluidics can split samples into thousands or millions of partitions, such as droplets or nanowells. Partitions capture analytes according to a Poisson distribution, and in diagnostics, the analyte concentration is commonly inferred with a closed-form solution via maximum likelihood estimation (MLE). Here, we present a new scalable approach to multiplexing analytes. We generalize MLE with microfluidic partitioning and extend our previously developed Sparse Poisson Recovery (SPoRe) inference algorithm. We also present the first in vitro demonstration of SPoRe with droplet digital PCR (ddPCR) toward infection diagnostics. Digital PCR is intrinsically highly sensitive, and SPoRe helps expand its multiplexing capacity by circumventing its channel limitations. We broadly amplify bacteria with 16S ddPCR and assign barcodes to nine pathogen genera by using five nonspecific probes. Given our two-channel ddPCR system, we measured two probes at a time in multiple groups of droplets. Although individual droplets are ambiguous in their bacterial contents, we recover the concentrations of bacteria in the sample from the pooled data. We achieve stable quantification down to approximately 200 total copies of the 16S gene per sample, enabling a suite of clinical applications given a robust upstream microbial DNA extraction procedure. We develop a new theory that generalizes the application of this framework to many realistic sensing modalities, and we prove scaling rules for system design to achieve further expanded multiplexing. The core principles demonstrated here could impact many biosensing applications with microfluidic partitioning.


Subject(s)
Bacteria , Microfluidics , Polymerase Chain Reaction/methods , Bacteria/genetics
4.
IEEE Trans Signal Process ; 70: 2388-2401, 2022.
Article in English | MEDLINE | ID: mdl-36082267

ABSTRACT

Compressed sensing (CS) is a signal processing technique that enables the efficient recovery of a sparse high-dimensional signal from low-dimensional measurements. In the multiple measurement vector (MMV) framework, a set of signals with the same support must be recovered from their corresponding measurements. Here, we present the first exploration of the MMV problem where signals are independently drawn from a sparse, multivariate Poisson distribution. We are primarily motivated by a suite of biosensing applications of microfluidics where analytes (such as whole cells or biomarkers) are captured in small volume partitions according to a Poisson distribution. We recover the sparse parameter vector of Poisson rates through maximum likelihood estimation with our novel Sparse Poisson Recovery (SPoRe) algorithm. SPoRe uses batch stochastic gradient ascent enabled by Monte Carlo approximations of otherwise intractable gradients. By uniquely leveraging the Poisson structure, SPoRe substantially outperforms a comprehensive set of existing and custom baseline CS algorithms. Notably, SPoRe can exhibit high performance even with one-dimensional measurements and high noise levels. This resource efficiency is not only unprecedented in the field of CS but is also particularly potent for applications in microfluidics in which the number of resolvable measurements per partition is often severely limited. We prove the identifiability property of the Poisson model under such lax conditions, analytically develop insights into system performance, and confirm these insights in simulated experiments. Our findings encourage a new approach to biosensing and are generalizable to other applications featuring spatial and temporal Poisson signals.

5.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 1098-1107, 2022 02.
Article in English | MEDLINE | ID: mdl-33026983

ABSTRACT

Inferring appropriate information from large datasets has become important. In particular, identifying relationships among variables in these datasets has far-reaching impacts. In this article, we introduce the uniform information coefficient (UIC), which measures the amount of dependence between two multidimensional variables and is able to detect both linear and non-linear associations. Our proposed UIC is inspired by the maximal information coefficient (MIC) [1].; however, the MIC was originally designed to measure dependence between two one-dimensional variables. Unlike the MIC calculation that depends on the type of association between two variables, we show that the UIC calculation is less computationally expensive and more robust to the type of association between two variables. The UIC achieves this by replacing the dynamic programming step in the MIC calculation with a simpler technique based on the uniform partitioning of the data grid. This computational efficiency comes at the cost of not maximizing the information coefficient as done by the MIC algorithm. We present theoretical guarantees for the performance of the UIC and a variety of experiments to demonstrate its quality in detecting associations.


Subject(s)
Algorithms
6.
Article in English | MEDLINE | ID: mdl-34746376

ABSTRACT

Ridge-like regularization often leads to improved generalization performance of machine learning models by mitigating overfitting. While ridge-regularized machine learning methods are widely used in many important applications, direct training via optimization could become challenging in huge data scenarios with millions of examples and features. We tackle such challenges by proposing a general approach that achieves ridge-like regularization through implicit techniques named Minipatch Ridge (MPRidge). Our approach is based on taking an ensemble of coefficients of unregularized learners trained on many tiny, random subsamples of both the examples and features of the training data, which we call minipatches. We empirically demonstrate that MPRidge induces an implicit ridge-like regularizing effect and performs nearly the same as explicit ridge regularization for a general class of predictors including logistic regression, SVM, and robust regression. Embarrassingly parallelizable, MPRidge provides a computationally appealing alternative to inducing ridge-like regularization for improving generalization performance in challenging big-data settings.

7.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2233-2244, 2021 Jul.
Article in English | MEDLINE | ID: mdl-33891546

ABSTRACT

We introduce a novel video-rate hyperspectral imager with high spatial, temporal and spectral resolutions. Our key hypothesis is that spectral profiles of pixels within each super-pixel tend to be similar. Hence, a scene-adaptive spatial sampling of a hyperspectral scene, guided by its super-pixel segmented image, is capable of obtaining high-quality reconstructions. To achieve this, we acquire an RGB image of the scene, compute its super-pixels, from which we generate a spatial mask of locations where we measure high-resolution spectrum. The hyperspectral image is subsequently estimated by fusing the RGB image and the spectral measurements using a learnable guided filtering approach. Due to low computational complexity of the superpixel estimation step, our setup can capture hyperspectral images of the scenes with little overhead over traditional snapshot hyperspectral cameras, but with significantly higher spatial and spectral resolutions. We validate the proposed technique with extensive simulations as well as a lab prototype that measures hyperspectral video at a spatial resolution of 600 ×900 pixels, at a spectral resolution of 10 nm over visible wavebands, and achieving a frame rate at 18fps.

8.
Nucleic Acids Res ; 48(10): 5217-5234, 2020 06 04.
Article in English | MEDLINE | ID: mdl-32338745

ABSTRACT

As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.


Subject(s)
Algorithms , Metagenomics/methods , Probability , Signal Processing, Computer-Assisted , Humans , Metagenome/genetics
9.
PLoS One ; 14(3): e0212508, 2019.
Article in English | MEDLINE | ID: mdl-30840653

ABSTRACT

Open Educational Resources (OER) have been lauded for their ability to reduce student costs and improve equity in higher education. Research examining whether OER provides learning benefits have produced mixed results, with most studies showing null effects. We argue that the common methods used to examine OER efficacy are unlikely to detect positive effects based on predictions of the access hypothesis. The access hypothesis states that OER benefits learning by providing access to critical course materials, and therefore predicts that OER should only benefit students who would not otherwise have access to the materials. Through the use of simulation analysis, we demonstrate that even if there is a learning benefit of OER, standard research methods are unlikely to detect it.


Subject(s)
Education, Distance , Learning , Students , Adolescent , Adult , Female , Humans , Male
10.
Sci Adv ; 3(12): e1701548, 2017 12.
Article in English | MEDLINE | ID: mdl-29226243

ABSTRACT

Modern biology increasingly relies on fluorescence microscopy, which is driving demand for smaller, lighter, and cheaper microscopes. However, traditional microscope architectures suffer from a fundamental trade-off: As lenses become smaller, they must either collect less light or image a smaller field of view. To break this fundamental trade-off between device size and performance, we present a new concept for three-dimensional (3D) fluorescence imaging that replaces lenses with an optimized amplitude mask placed a few hundred micrometers above the sensor and an efficient algorithm that can convert a single frame of captured sensor data into high-resolution 3D images. The result is FlatScope: perhaps the world's tiniest and lightest microscope. FlatScope is a lensless microscope that is scarcely larger than an image sensor (roughly 0.2 g in weight and less than 1 mm thick) and yet able to produce micrometer-resolution, high-frame rate, 3D fluorescence movies covering a total volume of several cubic millimeters. The ability of FlatScope to reconstruct full 3D images from a single frame of captured sensor data allows us to image 3D volumes roughly 40,000 times faster than a laser scanning confocal microscope while providing comparable resolution. We envision that this new flat fluorescence microscopy paradigm will lead to implantable endoscopes that minimize tissue damage, arrays of imagers that cover large areas, and bendable, flexible microscopes that conform to complex topographies.

11.
Biometrics ; 73(1): 10-19, 2017 03.
Article in English | MEDLINE | ID: mdl-27163413

ABSTRACT

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.


Subject(s)
Cluster Analysis , Data Interpretation, Statistical , Gene Regulatory Networks , Algorithms , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis
12.
Sci Adv ; 2(9): e1600025, 2016 09.
Article in English | MEDLINE | ID: mdl-27704040

ABSTRACT

Early identification of pathogens is essential for limiting development of therapy-resistant pathogens and mitigating infectious disease outbreaks. Most bacterial detection schemes use target-specific probes to differentiate pathogen species, creating time and cost inefficiencies in identifying newly discovered organisms. We present a novel universal microbial diagnostics (UMD) platform to screen for microbial organisms in an infectious sample, using a small number of random DNA probes that are agnostic to the target DNA sequences. Our platform leverages the theory of sparse signal recovery (compressive sensing) to identify the composition of a microbial sample that potentially contains novel or mutant species. We validated the UMD platform in vitro using five random probes to recover 11 pathogenic bacteria. We further demonstrated in silico that UMD can be generalized to screen for common human pathogens in different taxonomy levels. UMD's unorthodox sensing approach opens the door to more efficient and universal molecular diagnostics.


Subject(s)
Bacteria/genetics , DNA Probes/genetics , DNA, Bacterial/genetics , Infections/diagnosis , Bacteria/isolation & purification , Bacteria/pathogenicity , DNA, Bacterial/classification , Humans , Infections/genetics , Infections/microbiology , Polymerase Chain Reaction
13.
J Stat Plan Inference ; 166: 52-66, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26500388

ABSTRACT

We develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogenous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery.

14.
IEEE Trans Image Process ; 21(2): 494-504, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21859622

ABSTRACT

Compressive sensing (CS) is an emerging approach for the acquisition of signals having a sparse or compressible representation in some basis. While the CS literature has mostly focused on problems involving 1-D signals and 2-D images, many important applications involve multidimensional signals; the construction of sparsifying bases and measurement systems for such signals is complicated by their higher dimensionality. In this paper, we propose the use of Kronecker product matrices in CS for two purposes. First, such matrices can act as sparsifying bases that jointly model the structure present in all of the signal dimensions. Second, such matrices can represent the measurement protocols used in distributed settings. Our formulation enables the derivation of analytical bounds for the sparse approximation of multidimensional signals and CS recovery performance, as well as a means of evaluating novel distributed measurement schemes.

15.
Philos Trans A Math Phys Eng Sci ; 370(1958): 118-35, 2012 Jan 13.
Article in English | MEDLINE | ID: mdl-22124085

ABSTRACT

Signal compression is an important tool for reducing communication costs and increasing the lifetime of wireless sensor network deployments. In this paper, we overview and classify an array of proposed compression methods, with an emphasis on illustrating the differences between the various approaches.

16.
Science ; 331(6018): 717-9, 2011 Feb 11.
Article in English | MEDLINE | ID: mdl-21311012

ABSTRACT

The data deluge is changing the operating environment of many sensing systems from data-poor to data-rich--so data-rich that we are in jeopardy of being overwhelmed. Managing and exploiting the data deluge require a reinvention of sensor system design and signal processing theory. The potential pay-offs are huge, as the resulting sensor systems will enable radically new information technologies and powerful new tools for scientific discovery.


Subject(s)
Electronic Data Processing , Informatics , Information Management , Information Storage and Retrieval , Signal Processing, Computer-Assisted
18.
IEEE Trans Pattern Anal Mach Intell ; 32(10): 1888-98, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20724764

ABSTRACT

This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2nu-SVM. We then exploit a characterization of the 2nu-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.

19.
IEEE Trans Image Process ; 19(10): 2580-94, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20550996

ABSTRACT

The emergence of low-cost sensing architectures for diverse modalities has made it possible to deploy sensor networks that capture a single event from a large number of vantage points and using multiple modalities. In many scenarios, these networks acquire large amounts of very high-dimensional data. For example, even a relatively small network of cameras can generate massive amounts of high-dimensional image and video data. One way to cope with this data deluge is to exploit low-dimensional data models. Manifold models provide a particularly powerful theoretical and algorithmic framework for capturing the structure of data governed by a small number of parameters, as is often the case in a sensor network. However, these models do not typically take into account dependencies among multiple sensors. We thus propose a new joint manifold framework for data ensembles that exploits such dependencies. We show that joint manifold structure can lead to improved performance for a variety of signal processing algorithms for applications including classification and manifold learning. Additionally, recent results concerning random projections of manifolds enable us to formulate a scalable and universal dimensionality reduction scheme that efficiently fuses the data from all sensors.

20.
Article in English | MEDLINE | ID: mdl-19158952

ABSTRACT

Compressive sensing microarrays (CSMs) are DNA-based sensors that operate using group testing and compressive sensing (CS) principles. In contrast to conventional DNA microarrays, in which each genetic sensor is designed to respond to a single target, in a CSM, each sensor responds to a set of targets. We study the problem of designing CSMs that simultaneously account for both the constraints from CS theory and the biochemistry of probe-target DNA hybridization. An appropriate cross-hybridization model is proposed for CSMs, and several methods are developed for probe design and CS signal recovery based on the new model. Lab experiments suggest that in order to achieve accurate hybridization profiling, consensus probe sequences are required to have sequence homology of at least 80% with all targets to be detected. Furthermore, out-of-equilibrium datasets are usually as accurate as those obtained from equilibrium conditions. Consequently, one can use CSMs in applications in which only short hybridization times are allowed.

SELECTION OF CITATIONS
SEARCH DETAIL