Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 105
Filtrar
1.
IEEE Trans Image Process ; 33: 1683-1698, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38416621

RESUMEN

Image restoration under adverse weather conditions (e.g., rain, snow, and haze) is a fundamental computer vision problem that has important implications for various downstream applications. Distinct from early methods that are specially designed for specific types of weather, recent works tend to simultaneously remove various adverse weather effects based on either spatial feature representation learning or semantic information embedding. Inspired by various successful applications incorporating large-scale pre-trained models (e.g., CLIP), in this paper, we explore their potential benefits for leveraging large-scale pre-trained models in this task based on both spatial feature representation learning and semantic information embedding aspects: 1) spatial feature representation learning, we design a Spatially Adaptive Residual (SAR) encoder to adaptively extract degraded areas. To facilitate training of this model, we propose a Soft Residual Distillation (CLIP-SRD) strategy to transfer spatial knowledge from CLIP between clean and adverse weather images; 2) semantic information embedding, we propose a CLIP Weather Prior (CWP) embedding module to enable the network to adaptively respond to different weather conditions. This module integrates the sample-specific weather priors extracted by the CLIP image encoder with the distribution-specific information (as learned by a set of parameters) and embeds these elements using a cross-attention mechanism. Extensive experiments demonstrate that our proposed method can achieve state-of-the-art performance under various and severe adverse weather conditions. The code will be made available.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15260-15274, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37725727

RESUMEN

In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.

3.
Proc Natl Acad Sci U S A ; 119(51): e2206580119, 2022 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-36525536

RESUMEN

While the gig economy provides flexible jobs for millions of workers globally, a lack of organization identity and coworker bonds contributes to their low engagement and high attrition rates. To test the impact of virtual teams on worker productivity and retention, we conduct a field experiment with 27,790 drivers on a ride-sharing platform. We organize drivers into teams that are randomly assigned to receiving their team ranking, or individual ranking within their team, or individual performance information (control). We find that treated drivers work longer hours and generate significantly higher revenue. Furthermore, drivers in the team-ranking treatment continue to be more engaged 3 mo after the end of the experiment. A machine-learning analysis of 149 team contests in 86 cities suggests that social comparison, driver experience, and within-team similarity are the key predictors of the virtual team efficacy.

4.
IEEE Trans Neural Netw Learn Syst ; 33(6): 2301-2312, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34086581

RESUMEN

Video anomaly detection is commonly used in many applications, such as security surveillance, and is very challenging. A majority of recent video anomaly detection approaches utilize deep reconstruction models, but their performance is often suboptimal because of insufficient reconstruction error differences between normal and abnormal video frames in practice. Meanwhile, frame prediction-based anomaly detection methods have shown promising performance. In this article, we propose a novel and robust unsupervised video anomaly detection method by frame prediction with a proper design which is more in line with the characteristics of surveillance videos. The proposed method is equipped with a multipath ConvGRU-based frame prediction network that can better handle semantically informative objects and areas of different scales and capture spatial-temporal dependencies in normal videos. A noise tolerance loss is introduced during training to mitigate the interference caused by background noise. Extensive experiments have been conducted on the CUHK Avenue, ShanghaiTech Campus, and UCSD Pedestrian datasets, and the results show that our proposed method outperforms existing state-of-the-art approaches. Remarkably, our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.

5.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 770-782, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-33621166

RESUMEN

Graph node embedding aims at learning a vector representation for all nodes given a graph. It is a central problem in many machine learning tasks (e.g., node classification, recommendation, community detection). The key problem in graph node embedding lies in how to define the dependence to neighbors. Existing approaches specify (either explicitly or implicitly) certain dependencies on neighbors, which may lead to loss of subtle but important structural information within the graph and other dependencies among neighbors. This intrigues us to ask the question: can we design a model to give the adaptive flexibility of dependencies to each node's neighborhood. In this paper, we propose a novel graph node embedding method (named PINE) via a novel notion of partial permutation invariant set function, to capture any possible dependence. Our method 1) can learn an arbitrary form of the representation function from the neighborhood, without losing any potential dependence structures, and 2) is applicable to both homogeneous and heterogeneous graph embedding, the latter of which is challenged by the diversity of node types. Furthermore, we provide theoretical guarantee for the representation capability of our method for general homogeneous and heterogeneous graphs. Empirical evaluation results on benchmark data sets show that our proposed PINE method outperforms the state-of-the-art approaches on producing node vectors for various learning tasks of both homogeneous and heterogeneous graphs.

6.
IEEE Trans Image Process ; 31: 472-484, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34874853

RESUMEN

Hashing has been widely applied to the large-scale approximate nearest neighbor search problem owing to its high efficiency and low storage requirement. Most investigations concentrate on learning hashing methods in a centralized setting. However, in existing big data systems, data is often stored across different nodes. In some situations, data is even collected in a distributed manner. A straightforward way to solve this problem is to aggregate all the data into the fusion center to obtain the search result (aggregating method). However, this strategy is not feasible because of the prohibitive communication cost. Although a few distributed hashing methods have been proposed to reduce this cost, they only focus on designing a distributed algorithm for a specific global optimization objective without considering scalability. Moreover, existing distributed hashing methods aim at finding a distributed solution to hashing, meanwhile avoiding accuracy loss, rather than improving accuracy. To address these challenges, we propose a Scalable Distributed Hashing (SDisH) model in which most existing hashing methods can be extended to process distributed data with no changes. Furthermore, to improve accuracy, we utilize the search radius as a global variable across different nodes to achieve a global optimum search result for every iteration. In addition, a voting algorithm is presented based on the results produced by multiple iterations to further reduce search errors. Theoretical analyses of communication, computation, and accuracy demonstrate the superiority of the proposed model. Numerical simulations on three large-scale and two relatively small benchmark datasets also show that the SDisH model achieves up to 44.75% and 10.23% accuracy gains compared to the aggregating method and state-of-the-art distributed hashing methods, respectively.

7.
Front Neurosci ; 15: 762458, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34899166

RESUMEN

Amyloid-ß (Aß) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer's disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. One of the particular neurodegenerative regions is the hippocampus to which the influence of Aß/tau on has been one of the research focuses in the AD pathophysiological progress. This work proposes a novel framework, Federated Morphometry Feature Selection (FMFS) model, to examine subtle aspects of hippocampal morphometry that are associated with Aß/tau burden in the brain, measured using positron emission tomography (PET). FMFS is comprised of hippocampal surface-based feature calculation, patch-based feature selection, federated group LASSO regression, federated screening rule-based stability selection, and region of interest (ROI) identification. FMFS was tested on two Alzheimer's Disease Neuroimaging Initiative (ADNI) cohorts to understand hippocampal alterations that relate to Aß/tau depositions. Each cohort included pairs of MRI and PET for AD, mild cognitive impairment (MCI), and cognitively unimpaired (CU) subjects. Experimental results demonstrated that FMFS achieves an 89× speedup compared to other published state-of-the-art methods under five independent hypothetical institutions. In addition, the subiculum and cornu ammonis 1 (CA1 subfield) were identified as hippocampal subregions where atrophy is strongly associated with abnormal Aß/tau. As potential biomarkers for Aß/tau pathology, the features from the identified ROIs had greater power for predicting cognitive assessment and for survival analysis than five other imaging biomarkers. All the results indicate that FMFS is an efficient and effective tool to reveal associations between Aß/tau burden and hippocampal morphometry.

8.
Front Neurosci ; 15: 669595, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34421510

RESUMEN

Biomarker assisted preclinical/early detection and intervention in Alzheimer's disease (AD) may be the key to therapeutic breakthroughs. One of the presymptomatic hallmarks of AD is the accumulation of beta-amyloid (Aß) plaques in the human brain. However, current methods to detect Aß pathology are either invasive (lumbar puncture) or quite costly and not widely available (amyloid PET). Our prior studies show that magnetic resonance imaging (MRI)-based hippocampal multivariate morphometry statistics (MMS) are an effective neurodegenerative biomarker for preclinical AD. Here we attempt to use MRI-MMS to make inferences regarding brain Aß burden at the individual subject level. As MMS data has a larger dimension than the sample size, we propose a sparse coding algorithm, Patch Analysis-based Surface Correntropy-induced Sparse-coding and Max-Pooling (PASCS-MP), to generate a low-dimensional representation of hippocampal morphometry for each individual subject. Then we apply these individual representations and a binary random forest classifier to predict brain Aß positivity for each person. We test our method in two independent cohorts, 841 subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and 260 subjects from the Open Access Series of Imaging Studies (OASIS). Experimental results suggest that our proposed PASCS-MP method and MMS can discriminate Aß positivity in people with mild cognitive impairment (MCI) [Accuracy (ACC) = 0.89 (ADNI)] and in cognitively unimpaired (CU) individuals [ACC = 0.79 (ADNI) and ACC = 0.81 (OASIS)]. These results compare favorably relative to measures derived from traditional algorithms, including hippocampal volume and surface area, shape measures based on spherical harmonics (SPHARM) and our prior Patch Analysis-based Surface Sparse-coding and Max-Pooling (PASS-MP) methods.

9.
IEEE Trans Med Imaging ; 40(8): 2030-2041, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-33798076

RESUMEN

An effective presymptomatic diagnosis and treatment of Alzheimer's disease (AD) would have enormous public health benefits. Sparse coding (SC) has shown strong potential for longitudinal brain image analysis in preclinical AD research. However, the traditional SC computation is time-consuming and does not explore the feature correlations that are consistent over the time. In addition, longitudinal brain image cohorts usually contain incomplete image data and clinical labels. To address these challenges, we propose a novel two-stage Multi-Resemblance Multi-Target Low-Rank Coding (MMLC) method, which encourages that sparse codes of neighboring longitudinal time points are resemblant to each other, favors sparse code low-rankness to reduce the computational cost and is resilient to both source and target data incompleteness. In stage one, we propose an online multi-resemblant low-rank SC method to utilize the common and task-specific dictionaries in different time points to immune to incomplete source data and capture the longitudinal correlation. In stage two, supported by a rigorous theoretical analysis, we develop a multi-target learning method to address the missing clinical label issue. To solve such a multi-task low-rank sparse optimization problem, we propose multi-task stochastic coordinate coding with a sequence of closed-form update steps which reduces the computational costs guaranteed by a theoretical convergence proof. We apply MMLC on a publicly available neuroimaging cohort to predict two clinical measures and compare it with six other methods. Our experimental results show our proposed method achieves superior results on both computational efficiency and predictive accuracy and has great potential to assist the AD prevention.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Disfunción Cognitiva/diagnóstico por imagen , Humanos , Interpretación de Imagen Asistida por Computador , Neuroimagen
10.
Med Image Anal ; 70: 102009, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33711742

RESUMEN

Hyperbolic geometry has been successfully applied in modeling brain cortical and subcortical surfaces with general topological structures. However, such approaches, similar to other surface-based brain morphology analysis methods, usually generate high dimensional features. It limits their statistical power in cognitive decline prediction research, especially in datasets with limited subject numbers. To address the above limitation, we propose a novel framework termed as hyperbolic stochastic coding (HSC). We first compute diffeomorphic maps between general topological surfaces by mapping them to a canonical hyperbolic parameter space with consistent boundary conditions and extracts critical shape features. Secondly, in the hyperbolic parameter space, we introduce a farthest point sampling with breadth-first search method to obtain ring-shaped patches. Thirdly, stochastic coordinate coding and max-pooling algorithms are adopted for feature dimension reduction. We further validate the proposed system by comparing its classification accuracy with some other methods on two brain imaging datasets for Alzheimer's disease (AD) progression studies. Our preliminary experimental results show that our algorithm achieves superior results on various classification tasks. Our work may enrich surface-based brain imaging research tools and potentially result in a diagnostic and prognostic indicator to be useful in individualized treatment strategies.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Algoritmos , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Disfunción Cognitiva/diagnóstico por imagen , Humanos , Imagen por Resonancia Magnética
11.
IEEE Trans Image Process ; 30: 3985-3994, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33780338

RESUMEN

Hashing methods have been widely used in Approximate Nearest Neighbor (ANN) search for big data due to low storage requirements and high search efficiency. These methods usually map the ANN search for big data into the k -Nearest Neighbor ( k NN) search problem in Hamming space. However, Hamming distance calculation ignores the bit-level distinction, leading to confusing ranking. In order to further increase search accuracy, various bit-level weights have been proposed to rank hash codes in weighted Hamming space. Nevertheless, existing ranking methods in weighted Hamming space are almost based on exhaustive linear scan, which is time consuming and not suitable for large datasets. Although Multi-Index hashing that is a sub-linear search method has been proposed, it relies on Hamming distance rather than weighted Hamming distance. To address this issue, we propose an exact k NN search approach with Multiple Tables in Weighted Hamming space named WHMT, in which the distribution of bit-level weights is incorporated into the multi-index building. By WHMT, we can get the optimal candidate set for exact k NN search in weighted Hamming space without exhaustive linear scan. Experimental results show that WHMT can achieve dramatic speedup up to 69.8 times over linear scan baseline without losing accuracy in weighted Hamming space.

12.
IEEE Trans Image Process ; 30: 2513-2525, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33502979

RESUMEN

Inverse problems are a group of important mathematical problems that aim at estimating source data x and operation parameters z from inadequate observations y . In the image processing field, most recent deep learning-based methods simply deal with such problems under a pixel-wise regression framework (from y to x ) while ignoring the physics behind. In this paper, we re-examine these problems under a different viewpoint and propose a novel framework for solving certain types of inverse problems in image processing. Instead of predicting x directly from y , we train a deep neural network to estimate the degradation parameters z under an adversarial training paradigm. We show that if the degradation behind satisfies some certain assumptions, the solution to the problem can be improved by introducing additional adversarial constraints to the parameter space and the training may not even require pair-wise supervision. In our experiment, we apply our method to a variety of real-world problems, including image denoising, image deraining, image shadow removal, non-uniform illumination correction, and underdetermined blind source separation of images or speech signals. The results on multiple tasks demonstrate the effectiveness of our method.

13.
Interspeech ; 2021: 3830-3834, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35493062

RESUMEN

Alzheimer's disease (AD) is a neurodegenerative syndrome which affects tens of millions of elders worldwide. Although there is no treatment currently available, early recognition can improve the lives of people with AD and their caretakers and families. To find a cost-effective and easy-to-use method for dementia detection and address the dementia classification task of InterSpeech 2021 ADReSSo (Alzheimer's' Dementia Recognition through Spontaneous Speech only) challenge, we conduct a systematic comparison of approaches to detection of cognitive impairment based on spontaneous speech. We investigated the characteristics of acoustic modality and linguistic modality directly based on the audio recordings of narrative speech, and explored a variety of modality fusion strategies. With an ensemble over top-10 classifiers on the training set, we achieved an accuracy of 81.69% compared to the baseline of 78.87% on the test set. The results suggest that although transcription errors will be introduced through automatic speech recognition, integrating textual information generally improves classification performance. Besides, ensemble methods can boost both the accuracy and the robustness of models.

14.
IEEE Trans Cybern ; 51(9): 4602-4610, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-32628608

RESUMEN

The accurate prediction of online taxi-hailing demand is challenging but of significant value in the development of the intelligent transportation system. This article focuses on large-scale online taxi-hailing demand prediction and proposes a personalized demand prediction model. A model with two attention blocks is proposed to capture both spatial and temporal perspectives. We also explored the impact of network architecture on taxi-hailing demand prediction accuracy. The proposed method is universal in the sense that it is applicable to problems associated with large-scale spatiotemporal prediction. The experimental results on city-wide online taxi-hailing demand dataset demonstrate that the proposed personalized demand prediction model achieves superior prediction accuracy.

15.
Artículo en Inglés | MEDLINE | ID: mdl-33250550

RESUMEN

Collectively, vast quantities of brain imaging data exist across hospitals and research institutions, providing valuable resources to study brain disorders such as Alzheimer's disease (AD). However, in practice, putting all these distributed datasets into a centralized platform is infeasible due to patient privacy concerns, data restrictions and legal regulations. In this study, we propose a novel federated feature selection framework that can analyze the data at each individual institution without data-sharing or accessing private patient information. In this framework, we first propose a federated group lasso optimization method based on block coordinate descent. We employ stability selection to determine statistically significant features, by solving the group lasso problem with a sequence of regularization parameters. To accelerate the stability selection, we further propose a federated screening rule, which can identify and exclude the irrelevant features before solving the group lasso. Here, we use this framework for patch based feature selection on hippocampal morphometry. Shape is characterized through two different kinds of local measures, the radial distance and the surface area determined via tensor-based morphometry (TBM). The method is tested on 1,127 T1-weighted brain magnetic resonance images (MRI) of AD, mild cognitive impairment (MCI) and elderly control subjects, randomly assigned to five independent hypothetical institutions for testing purpose. We examine the association of MRI-based anatomical measures with general cognitive assessment and amyloid burden to identify the morphometry changes related to AD deterioration and plaque accumulation. Finally, we visualize the significance of the association on the hippocampal surfaces. Our experimental results successfully demonstrate the efficiency and effectiveness of our method.

16.
J Alzheimers Dis ; 75(3): 971-992, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32390615

RESUMEN

BACKGROUND: Disease progression prediction based on neuroimaging biomarkers is vital in Alzheimer's disease (AD) research. Convolutional neural networks (CNN) have been proved to be powerful for various computer vision research by refining reliable and high-level feature maps from image patches. OBJECTIVE: A key challenge in applying CNN to neuroimaging research is the limited labeled samples with high dimensional features. Another challenge is how to improve the prediction accuracy by joint analysis of multiple data sources (i.e., multiple time points or multiple biomarkers). To address these two challenges, we propose a novel multi-task learning framework based on CNN. METHODS: First, we pre-trained CNN on the ImageNet dataset and transferred the knowledge from the pre-trained model to neuroimaging representation. We used this deep model as feature extractor to generate high-level feature maps of different tasks. Then a novel unsupervised learning method, termed Multi-task Stochastic Coordinate Coding (MSCC), was proposed for learning sparse features of multi-task feature maps by using shared and individual dictionaries. Finally, Lasso regression was performed on these multi-task sparse features to predict AD progression measured by the Mini-Mental State Examination (MMSE) and the Alzheimer's Disease Assessment Scale cognitive subscale (ADAS-Cog). RESULTS: We applied this novel CNN-MSCC system on the Alzheimer's Disease Neuroimaging Initiative dataset to predict future MMSE/ADAS-Cog scales. We found our method achieved superior performances compared with seven other methods. CONCLUSION: Our work may add new insights into data augmentation and multi-task deep model research and facilitate the adoption of deep models in neuroimaging research.


Asunto(s)
Disfunción Cognitiva/diagnóstico por imagen , Disfunción Cognitiva/patología , Interpretación de Imagen Asistida por Computador/métodos , Redes Neurales de la Computación , Anciano , Cromanos , Disfunción Cognitiva/psicología , Progresión de la Enfermedad , Humanos , Aprendizaje , Estudios Longitudinales , Masculino , Pruebas de Estado Mental y Demencia
17.
IEEE Trans Big Data ; 6(2): 322-333, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36846743

RESUMEN

A central theme in learning from image data is to develop appropriate representations for the specific task at hand. Thus, a practical challenge is to determine what features are appropriate for specific tasks. For example, in the study of gene expression patterns in Drosophila, texture features were particularly effective for determining the developmental stages from in situ hybridization images. Such image representation is however not suitable for controlled vocabulary term annotation. Here, we developed feature extraction methods to generate hierarchical representations for ISH images. Our approach is based on the deep convolutional neural networks that can act on image pixels directly. To make the extracted features generic, the models were trained using a natural image set with millions of labeled examples. These models were transferred to the ISH image domain. To account for the differences between the source and target domains, we proposed a partial transfer learning scheme in which only part of the source model is transferred. We employed multi-task learning method to fine-tune the pre-trained models with labeled ISH images. Results showed that feature representations computed by deep models based on transfer and multi-task learning significantly outperformed other methods for annotating gene expression patterns at different stage ranges.

18.
IEEE Trans Big Data ; 5(2): 109-119, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-31240237

RESUMEN

Since the BRAIN Initiative and Human Brain Project began, a few efforts have been made to address the computational challenges of neuroscience Big Data. The promises of these two projects were to model the complex interaction of brain and behavior and to understand and diagnose brain diseases by collecting and analyzing large quanitites of data. Archiving, analyzing, and sharing the growing neuroimaging datasets posed major challenges. New computational methods and technologies have emerged in the domain of Big Data but have not been fully adapted for use in neuroimaging. In this work, we introduce the current challenges of neuroimaging in a big data context. We review our efforts toward creating a data management system to organize the large-scale fMRI datasets, and present our novel algorithms/methods for the distributed fMRI data processing that employs Hadoop and Spark. Finally, we demonstrate the significant performance gains of our algorithms/methods to perform distributed dictionary learning.

19.
IEEE Trans Biomed Eng ; 66(1): 289-299, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-29993466

RESUMEN

In this work, we conduct comprehensive comparisons between four variants of independent component analysis (ICA) methods and three variants of sparse dictionary learning (SDL) methods, both at the subject-level, by using synthesized fMRI data with ground-truth. Our results showed that ICA methods perform very well and slightly better than SDL methods when functional networks' spatial overlaps are minor, but ICA methods have difficulty in differentiating functional networks with moderate or significant spatial overlaps. In contrast, the SDL algorithms perform consistently well no matter how functional networks spatially overlap, and importantly, SDL methods are significantly better than ICA methods when spatial overlaps between networks are moderate or severe. This work offers empirical better understanding of ICA and SDL algorithms in inferring functional networks from fMRI data and provides new guidelines and caveats when constructing and interpreting functional networks in the era of fMRI-based connectomics.


Asunto(s)
Mapeo Encefálico/métodos , Encéfalo/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Aprendizaje Automático Supervisado , Algoritmos , Humanos , Análisis de Componente Principal
20.
Artículo en Inglés | MEDLINE | ID: mdl-29993952

RESUMEN

The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization is an essential step in differential expression (DE) analysis. The normalization step of existing DE detection algorithms is usually ad hoc and performed only once prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented Lagrangian method to solve it. Simulation and real data studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Análisis de Secuencia de ARN/métodos , Algoritmos , Bases de Datos Genéticas , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...