Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
IEEE Trans Neural Netw Learn Syst ; 35(4): 5014-5026, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37104113

RESUMEN

The first step toward investigating the effectiveness of a treatment via a randomized trial is to split the population into control and treatment groups then compare the average response of the treatment group receiving the treatment to the control group receiving the placebo. To ensure that the difference between the two groups is caused only by the treatment, it is crucial that the control and the treatment groups have similar statistics. Indeed, the validity and reliability of a trial are determined by the similarity of two groups' statistics. Covariate balancing methods increase the similarity between the distributions of the two groups' covariates. However, often in practice, there are not enough samples to accurately estimate the groups' covariate distributions. In this article, we empirically show that covariate balancing with the standardized means difference (SMD) covariate balancing measure, as well as Pocock and Simon's sequential treatment assignment method, are susceptible to worst case treatment assignments. Worst case treatment assignments are those admitted by the covariate balance measure, but result in highest possible ATE estimation errors. We developed an adversarial attack to find adversarial treatment assignment for any given trial. Then, we provide an index to measure how close the given trial is to the worst case. To this end, we provide an optimization-based algorithm, namely adversarial treatment assignment in treatment effect trials (ATASTREET), to find the adversarial treatment assignments.


Asunto(s)
Redes Neurales de la Computación , Proyectos de Investigación , Reproducibilidad de los Resultados , Ensayos Clínicos Controlados Aleatorios como Asunto , Simulación por Computador
2.
Anal Chem ; 95(48): 17458-17466, 2023 12 05.
Artículo en Inglés | MEDLINE | ID: mdl-37971927

RESUMEN

Microfluidics can split samples into thousands or millions of partitions, such as droplets or nanowells. Partitions capture analytes according to a Poisson distribution, and in diagnostics, the analyte concentration is commonly inferred with a closed-form solution via maximum likelihood estimation (MLE). Here, we present a new scalable approach to multiplexing analytes. We generalize MLE with microfluidic partitioning and extend our previously developed Sparse Poisson Recovery (SPoRe) inference algorithm. We also present the first in vitro demonstration of SPoRe with droplet digital PCR (ddPCR) toward infection diagnostics. Digital PCR is intrinsically highly sensitive, and SPoRe helps expand its multiplexing capacity by circumventing its channel limitations. We broadly amplify bacteria with 16S ddPCR and assign barcodes to nine pathogen genera by using five nonspecific probes. Given our two-channel ddPCR system, we measured two probes at a time in multiple groups of droplets. Although individual droplets are ambiguous in their bacterial contents, we recover the concentrations of bacteria in the sample from the pooled data. We achieve stable quantification down to approximately 200 total copies of the 16S gene per sample, enabling a suite of clinical applications given a robust upstream microbial DNA extraction procedure. We develop a new theory that generalizes the application of this framework to many realistic sensing modalities, and we prove scaling rules for system design to achieve further expanded multiplexing. The core principles demonstrated here could impact many biosensing applications with microfluidic partitioning.


Asunto(s)
Bacterias , Microfluídica , Reacción en Cadena de la Polimerasa/métodos , Bacterias/genética
3.
IEEE Trans Signal Process ; 70: 2388-2401, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36082267

RESUMEN

Compressed sensing (CS) is a signal processing technique that enables the efficient recovery of a sparse high-dimensional signal from low-dimensional measurements. In the multiple measurement vector (MMV) framework, a set of signals with the same support must be recovered from their corresponding measurements. Here, we present the first exploration of the MMV problem where signals are independently drawn from a sparse, multivariate Poisson distribution. We are primarily motivated by a suite of biosensing applications of microfluidics where analytes (such as whole cells or biomarkers) are captured in small volume partitions according to a Poisson distribution. We recover the sparse parameter vector of Poisson rates through maximum likelihood estimation with our novel Sparse Poisson Recovery (SPoRe) algorithm. SPoRe uses batch stochastic gradient ascent enabled by Monte Carlo approximations of otherwise intractable gradients. By uniquely leveraging the Poisson structure, SPoRe substantially outperforms a comprehensive set of existing and custom baseline CS algorithms. Notably, SPoRe can exhibit high performance even with one-dimensional measurements and high noise levels. This resource efficiency is not only unprecedented in the field of CS but is also particularly potent for applications in microfluidics in which the number of resolvable measurements per partition is often severely limited. We prove the identifiability property of the Poisson model under such lax conditions, analytically develop insights into system performance, and confirm these insights in simulated experiments. Our findings encourage a new approach to biosensing and are generalizable to other applications featuring spatial and temporal Poisson signals.

4.
Nat Commun ; 13(1): 1728, 2022 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-35365602

RESUMEN

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.


Asunto(s)
Aprendizaje Profundo , Biología Computacional , Filogenia , Proteínas , Biología de Sistemas
5.
Comput Med Imaging Graph ; 97: 102052, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35299096

RESUMEN

Cervical cancer is a public health emergency in low- and middle-income countries where resource limitations hamper standard-of-care prevention strategies. The high-resolution endomicroscope (HRME) is a low-cost, point-of-care device with which care providers can image the nuclear morphology of cervical lesions. Here, we propose a deep learning framework to diagnose cervical intraepithelial neoplasia grade 2 or more severe from HRME images. The proposed multi-task convolutional neural network uses nuclear segmentation to learn a diagnostically relevant representation. Nuclear segmentation was trained via proxy labels to circumvent the need for expensive, manually annotated nuclear masks. A dataset of images from over 1600 patients was used to train, validate, and test our algorithm; data from 20% of patients were reserved for testing. An external evaluation set with images from 508 patients was used to further validate our findings. The proposed method consistently outperformed other state-of-the art architectures achieving a test per patient area under the receiver operating characteristic curve (AUC-ROC) of 0.87. Performance was comparable to expert colposcopy with a test sensitivity and specificity of 0.94 (p = 0.3) and 0.58 (p = 1.0), respectively. Patients with recurrent human papillomavirus (HPV) infections are at a higher risk of developing cervical cancer. Thus, we sought to incorporate HPV DNA test results as a feature to inform prediction. We found that incorporating patient HPV status improved test specificity to 0.71 at a sensitivity of 0.94.


Asunto(s)
Infecciones por Papillomavirus , Displasia del Cuello del Útero , Neoplasias del Cuello Uterino , Colposcopía/métodos , Detección Precoz del Cáncer/métodos , Femenino , Humanos , Redes Neurales de la Computación , Infecciones por Papillomavirus/diagnóstico por imagen , Embarazo , Sensibilidad y Especificidad , Neoplasias del Cuello Uterino/diagnóstico por imagen , Neoplasias del Cuello Uterino/patología , Displasia del Cuello del Útero/diagnóstico por imagen , Displasia del Cuello del Útero/patología
6.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 1098-1107, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-33026983

RESUMEN

Inferring appropriate information from large datasets has become important. In particular, identifying relationships among variables in these datasets has far-reaching impacts. In this article, we introduce the uniform information coefficient (UIC), which measures the amount of dependence between two multidimensional variables and is able to detect both linear and non-linear associations. Our proposed UIC is inspired by the maximal information coefficient (MIC) [1].; however, the MIC was originally designed to measure dependence between two one-dimensional variables. Unlike the MIC calculation that depends on the type of association between two variables, we show that the UIC calculation is less computationally expensive and more robust to the type of association between two variables. The UIC achieves this by replacing the dynamic programming step in the MIC calculation with a simpler technique based on the uniform partitioning of the data grid. This computational efficiency comes at the cost of not maximizing the information coefficient as done by the MIC algorithm. We present theoretical guarantees for the performance of the UIC and a variety of experiments to demonstrate its quality in detecting associations.


Asunto(s)
Algoritmos
7.
Opt Express ; 29(23): 38540-38556, 2021 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-34808905

RESUMEN

Conventional continuous-wave amplitude-modulated time-of-flight (CWAM ToF) cameras suffer from a fundamental trade-off between light throughput and depth of field (DoF): a larger lens aperture allows more light collection but suffers from significantly lower DoF. However, both high light throughput, which increases signal-to-noise ratio, and a wide DoF, which enlarges the system's applicable depth range, are valuable for CWAM ToF applications. In this work, we propose EDoF-ToF, an algorithmic method to extend the DoF of large-aperture CWAM ToF cameras by using a neural network to deblur objects outside of the lens's narrow focal region and thus produce an all-in-focus measurement. A key component of our work is the proposed large-aperture ToF training data simulator, which models the depth-dependent blurs and partial occlusions caused by such apertures. Contrary to conventional image deblurring where the blur model is typically linear, ToF depth maps are nonlinear functions of scene intensities, resulting in a nonlinear blur model that we also derive for our simulator. Unlike extended DoF for conventional photography where depth information needs to be encoded (or made depth-invariant) using additional hardware (phase masks, focal sweeping, etc.), ToF sensor measurements naturally encode depth information, allowing a completely software solution to extended DoF. We experimentally demonstrate EDoF-ToF increasing the DoF of a conventional ToF system by 3.6 ×, effectively achieving the DoF of a smaller lens aperture that allows 22.1 × less light. Ultimately, EDoF-ToF enables CWAM ToF cameras to enjoy the benefits of both high light throughput and a wide DoF.

8.
Artículo en Inglés | MEDLINE | ID: mdl-34746376

RESUMEN

Ridge-like regularization often leads to improved generalization performance of machine learning models by mitigating overfitting. While ridge-regularized machine learning methods are widely used in many important applications, direct training via optimization could become challenging in huge data scenarios with millions of examples and features. We tackle such challenges by proposing a general approach that achieves ridge-like regularization through implicit techniques named Minipatch Ridge (MPRidge). Our approach is based on taking an ensemble of coefficients of unregularized learners trained on many tiny, random subsamples of both the examples and features of the training data, which we call minipatches. We empirically demonstrate that MPRidge induces an implicit ridge-like regularizing effect and performs nearly the same as explicit ridge regularization for a general class of predictors including logistic regression, SVM, and robust regression. Embarrassingly parallelizable, MPRidge provides a computationally appealing alternative to inducing ridge-like regularization for improving generalization performance in challenging big-data settings.

9.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2233-2244, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33891546

RESUMEN

We introduce a novel video-rate hyperspectral imager with high spatial, temporal and spectral resolutions. Our key hypothesis is that spectral profiles of pixels within each super-pixel tend to be similar. Hence, a scene-adaptive spatial sampling of a hyperspectral scene, guided by its super-pixel segmented image, is capable of obtaining high-quality reconstructions. To achieve this, we acquire an RGB image of the scene, compute its super-pixels, from which we generate a spatial mask of locations where we measure high-resolution spectrum. The hyperspectral image is subsequently estimated by fusing the RGB image and the spectral measurements using a learnable guided filtering approach. Due to low computational complexity of the superpixel estimation step, our setup can capture hyperspectral images of the scenes with little overhead over traditional snapshot hyperspectral cameras, but with significantly higher spatial and spectral resolutions. We validate the proposed technique with extensive simulations as well as a lab prototype that measures hyperspectral video at a spatial resolution of 600 ×900 pixels, at a spectral resolution of 10 nm over visible wavebands, and achieving a frame rate at 18fps.

10.
IEEE Trans Haptics ; 14(1): 188-199, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-32746381

RESUMEN

Communication is an important part of our daily interactions; however, communication can be hindered, either through visual or auditory impairment, or because usual communication channels are overloaded. When standard communication channels are not available, our sense of touch offers an alternative sensory modality for transmitting messages. Multi-sensory haptic cues that combine multiple types of haptic sensations have shown promise for applications, such as haptic communication, that require large discrete cue sets while maintaining a small, wearable form factor. This article presents language transmission using a multi-sensory haptic device that occupies a small footprint on the upper arm. In our approach, phonemes are encoded as multisensory haptic cues consisting of vibration, radial squeeze, and lateral skin stretch components. Participants learned to identify haptically transmitted phonemes and words after training across a four day training period. A subset of our participants continued their training to extend word recognition free response. Participants were able to identify words after four days using multiple choice with an accuracy of 89% and after eight days using free response with an accuracy of 70%. These results show promise for the use of multisensory haptics for haptic communication, demonstrating high word recognition performance with a small, wearable device.


Asunto(s)
Percepción del Tacto , Dispositivos Electrónicos Vestibles , Señales (Psicología) , Humanos , Lenguaje , Tacto
11.
Proc Natl Acad Sci U S A ; 117(48): 30029-30032, 2020 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-33229565
12.
Nat Commun ; 11(1): 3972, 2020 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-32769972

RESUMEN

The continuously growing amount of seismic data collected worldwide is outpacing our abilities for analysis, since to date, such datasets have been analyzed in a human-expert-intensive, supervised fashion. Moreover, analyses that are conducted can be strongly biased by the standard models employed by seismologists. In response to both of these challenges, we develop a new unsupervised machine learning framework for detecting and clustering seismic signals in continuous seismic records. Our approach combines a deep scattering network and a Gaussian mixture model to cluster seismic signal segments and detect novel structures. To illustrate the power of the framework, we analyze seismic data acquired during the June 2017 Nuugaatsiaq, Greenland landslide. We demonstrate the blind detection and recovery of the repeating precursory seismicity that was recorded before the main landslide rupture, which suggests that our approach could lead to more informative forecasting of the seismic activity in seismogenic areas.

13.
Nucleic Acids Res ; 48(10): 5217-5234, 2020 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-32338745

RESUMEN

As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.


Asunto(s)
Algoritmos , Metagenómica/métodos , Probabilidad , Procesamiento de Señales Asistido por Computador , Humanos , Metagenoma/genética
14.
PLoS One ; 14(3): e0212508, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30840653

RESUMEN

Open Educational Resources (OER) have been lauded for their ability to reduce student costs and improve equity in higher education. Research examining whether OER provides learning benefits have produced mixed results, with most studies showing null effects. We argue that the common methods used to examine OER efficacy are unlikely to detect positive effects based on predictions of the access hypothesis. The access hypothesis states that OER benefits learning by providing access to critical course materials, and therefore predicts that OER should only benefit students who would not otherwise have access to the materials. Through the use of simulation analysis, we demonstrate that even if there is a learning benefit of OER, standard research methods are unlikely to detect it.


Asunto(s)
Educación a Distancia , Aprendizaje , Estudiantes , Adolescente , Adulto , Femenino , Humanos , Masculino
15.
J Chem Theory Comput ; 14(5): 2771-2783, 2018 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-29660273

RESUMEN

Recent methods for the analysis of molecular kinetics from massive molecular dynamics (MD) data rely on the solution of very large eigenvalue problems. Here we build upon recent results from the field of compressed sensing and develop the spectral oASIS method, a highly efficient approach to approximate the leading eigenvalues and eigenvectors of large generalized eigenvalue problems without ever having to evaluate the full matrices. The approach is demonstrated to reduce the dimensionality of the problem by 1 or 2 orders of magnitude, directly leading to corresponding savings in the computation and storage of the necessary matrices and a speedup of 2 to 4 orders of magnitude in solving the eigenvalue problem. We demonstrate the method on extensive data sets of protein conformational changes and protein-ligand binding using the variational approach to conformation dynamics (VAC) and time-lagged independent component analysis (TICA). Our approach can also be applied to kernel formulations of VAC, TICA, and extended dynamic mode decomposition (EDMD).

16.
IEEE Trans Neural Netw Learn Syst ; 29(7): 2717-2730, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-28534788

RESUMEN

This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense data sets. Our framework exploits data structure to scalably factorize it into an ensemble of lower rank subspaces. The factorization creates sparse low-dimensional representations of the data, a property which is leveraged to devise effective mapping and scheduling of iterative learning algorithms on the distributed computing machines. We provide two APIs, one matrix-based and one graph-based, which facilitate automated adoption of the framework for performing several contemporary learning applications. To demonstrate the utility of RankMap, we solve sparse recovery and power iteration problems on various real-world data sets with up to 1.8 billion nonzeros. Our evaluations are performed on Amazon EC2 and IBM iDataPlex servers using up to 244 cores. The results demonstrate up to two orders of magnitude improvements in memory usage, execution speed, and bandwidth compared with the best reported prior work, while achieving the same level of learning accuracy.

17.
Sci Adv ; 3(12): e1701548, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-29226243

RESUMEN

Modern biology increasingly relies on fluorescence microscopy, which is driving demand for smaller, lighter, and cheaper microscopes. However, traditional microscope architectures suffer from a fundamental trade-off: As lenses become smaller, they must either collect less light or image a smaller field of view. To break this fundamental trade-off between device size and performance, we present a new concept for three-dimensional (3D) fluorescence imaging that replaces lenses with an optimized amplitude mask placed a few hundred micrometers above the sensor and an efficient algorithm that can convert a single frame of captured sensor data into high-resolution 3D images. The result is FlatScope: perhaps the world's tiniest and lightest microscope. FlatScope is a lensless microscope that is scarcely larger than an image sensor (roughly 0.2 g in weight and less than 1 mm thick) and yet able to produce micrometer-resolution, high-frame rate, 3D fluorescence movies covering a total volume of several cubic millimeters. The ability of FlatScope to reconstruct full 3D images from a single frame of captured sensor data allows us to image 3D volumes roughly 40,000 times faster than a laser scanning confocal microscope while providing comparable resolution. We envision that this new flat fluorescence microscopy paradigm will lead to implantable endoscopes that minimize tissue damage, arrays of imagers that cover large areas, and bendable, flexible microscopes that conform to complex topographies.

18.
Biometrics ; 73(1): 10-19, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-27163413

RESUMEN

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.


Asunto(s)
Análisis por Conglomerados , Interpretación Estadística de Datos , Redes Reguladoras de Genes , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos
19.
Sci Adv ; 2(9): e1600025, 2016 09.
Artículo en Inglés | MEDLINE | ID: mdl-27704040

RESUMEN

Early identification of pathogens is essential for limiting development of therapy-resistant pathogens and mitigating infectious disease outbreaks. Most bacterial detection schemes use target-specific probes to differentiate pathogen species, creating time and cost inefficiencies in identifying newly discovered organisms. We present a novel universal microbial diagnostics (UMD) platform to screen for microbial organisms in an infectious sample, using a small number of random DNA probes that are agnostic to the target DNA sequences. Our platform leverages the theory of sparse signal recovery (compressive sensing) to identify the composition of a microbial sample that potentially contains novel or mutant species. We validated the UMD platform in vitro using five random probes to recover 11 pathogenic bacteria. We further demonstrated in silico that UMD can be generalized to screen for common human pathogens in different taxonomy levels. UMD's unorthodox sensing approach opens the door to more efficient and universal molecular diagnostics.


Asunto(s)
Bacterias/genética , Sondas de ADN/genética , ADN Bacteriano/genética , Infecciones/diagnóstico , Bacterias/aislamiento & purificación , Bacterias/patogenicidad , ADN Bacteriano/clasificación , Humanos , Infecciones/genética , Infecciones/microbiología , Reacción en Cadena de la Polimerasa
20.
J Stat Plan Inference ; 166: 52-66, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26500388

RESUMEN

We develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogenous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...