Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Comput Biol Med ; 152: 106409, 2023 01.
Article in English | MEDLINE | ID: mdl-36512878

ABSTRACT

Rapid advances in single-cell transcriptome analysis provide deeper insights into the study of tissue heterogeneity at the cellular level. Unsupervised clustering can identify potential cell populations in single-cell RNA-sequencing (scRNA-seq) data, but fail to further determine the identity of each cell. Existing automatic annotation methods using scRNA-seq data based on machine learning mainly use single feature set and single classifier. In view of this, we propose a Weighted Ensemble classification framework for Cell Type Annotation, named scWECTA, which improves the accuracy of cell type identification. scWECTA uses five informative gene sets and integrates five classifiers based on soft weighted ensemble framework. And the ensemble weights are inferred through the constrained non-negative least squares. Validated on multiple pairs of scRNA-seq datasets, scWECTA is able to accurately annotate scRNA-seq data across platforms and across tissues, especially for imbalanced data containing rare cell types. Moreover, scWECTA outperforms other comparable methods in balancing the prediction accuracy of common cell types and the unassigned rate of non-common cell types at the same time. The source code of scWECTA is freely available at https://github.com/ttren-sc/scWECTA.


Subject(s)
Single-Cell Analysis , Transcriptome , Transcriptome/genetics , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Software , Cluster Analysis , Gene Expression Profiling/methods
2.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36208175

ABSTRACT

Cell-type composition of intact bulk tissues can vary across samples. Deciphering cell-type composition and its changes during disease progression is an important step toward understanding disease pathogenesis. To infer cell-type composition, existing cell-type deconvolution methods for bulk RNA sequencing (RNA-seq) data often require matched single-cell RNA-seq (scRNA-seq) data, generated from samples with similar clinical conditions, as reference. However, due to the difficulty of obtaining scRNA-seq data in diseased samples, only limited scRNA-seq data in matched disease conditions are available. Using scRNA-seq reference to deconvolve bulk RNA-seq data from samples with different disease conditions may lead to a biased estimation of cell-type proportions. To overcome this limitation, we propose an iterative estimation procedure, MuSiC2, which is an extension of MuSiC, to perform deconvolution analysis of bulk RNA-seq data generated from samples with multiple clinical conditions where at least one condition is different from that of the scRNA-seq reference. Extensive benchmark evaluations indicated that MuSiC2 improved the accuracy of cell-type proportion estimates of bulk RNA-seq samples under different conditions as compared with the traditional MuSiC deconvolution. MuSiC2 was applied to two bulk RNA-seq datasets for deconvolution analysis, including one from human pancreatic islets and the other from human retina. We show that MuSiC2 improves current deconvolution methods and provides more accurate cell-type proportion estimates when the bulk and single-cell reference differ in clinical conditions. We believe the condition-specific cell-type composition estimates from MuSiC2 will facilitate the downstream analysis and help identify cellular targets of human diseases.


Subject(s)
RNA , Single-Cell Analysis , Humans , RNA/genetics , RNA-Seq , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Transcriptome , Sequence Analysis, RNA/methods
3.
Appl Spectrosc ; 76(5): 548-558, 2022 May.
Article in English | MEDLINE | ID: mdl-35255739

ABSTRACT

Due to the advantages of low price and convenience for end-users to conduct field-based, in-situ analysis, handheld Raman spectrometers are widely used in the identification of mixture components. However, the spectra collected by handheld Raman spectrometer usually have serious peak overlapping and spectral distortion, resulting in difficulties in component identification in the mixture. A novel method for mixture components identification based on the handheld Raman spectrometer was proposed in this study. The wavelet transform and Voight curve fitting method were used to extract the feature parameters from each Raman spectral peak, including Raman shift, maximum intensity, and full width at half-maximum (FWHM), and the similarities between the mixture and each substance in the database were calculated by fuzzy membership function based on extracted feature parameters. Then, the possible substances in the mixture were preliminarily screened out as candidates according to the similarity. Finally, the Raman spectra of these candidates were used to fit the spectra of the mixture, and the fitting coefficients obtained by sparse non-negative least squares algorithm were employed to further determine the suspected substance in the mixture. The Raman spectra of 190 liquid mixture samples and 158 powder mixture samples were collected using a handheld Raman spectrometer and these spectra were used to validate the identification performance of the proposed method. The proposed method could achieve good identification accuracy for different mixture samples. It shows that the proposed method is an effective way for the component identification in mixture by using a handheld Raman spectrometer.


Subject(s)
Spectrum Analysis, Raman , Least-Squares Analysis , Spectrum Analysis, Raman/methods
4.
Magn Reson Med ; 87(2): 915-931, 2022 02.
Article in English | MEDLINE | ID: mdl-34490909

ABSTRACT

PURPOSE: The decomposition of multi-exponential decay data into a T2 spectrum poses substantial challenges for conventional fitting algorithms, including non-negative least squares (NNLS). Based on a combination of the resolution limit constraint and machine learning neural network algorithm, a data-driven and highly tailorable analysis method named spectrum analysis for multiple exponentials via experimental condition oriented simulation (SAME-ECOS) was proposed. THEORY AND METHODS: The theory of SAME-ECOS was derived. Then, a paradigm was presented to demonstrate the SAME-ECOS workflow, consisting of a series of calculation, simulation, and model training operations. The performance of the trained SAME-ECOS model was evaluated using simulations and six in vivo brain datasets. The code is available at https://github.com/hanwencat/SAME-ECOS. RESULTS: Using NNLS as the baseline, SAME-ECOS achieved over 15% higher overall cosine similarity scores in producing the T2 spectrum, and more than 10% lower mean absolute error in calculating the myelin water fraction (MWF), as well as demonstrated better robustness to noise in the simulation tests. Applying to in vivo data, MWF from SAME-ECOS and NNLS was highly correlated among all study participants. However, a distinct separation of the myelin water peak and the intra/extra-cellular water peak was only observed in the mean T2 spectra determined using SAME-ECOS. In terms of data processing speed, SAME-ECOS is approximately 30 times faster than NNLS, achieving a whole-brain analysis in 3 min. CONCLUSION: Compared with NNLS, the SAME-ECOS method yields much more reliable T2 spectra in a dramatically shorter time, increasing the feasibility of multi-component T2 decay analysis in clinical settings.


Subject(s)
Myelin Sheath , Water , Algorithms , Brain/diagnostic imaging , Humans , Magnetic Resonance Imaging , Spectrum Analysis
5.
Neuroimage ; 244: 118582, 2021 12 01.
Article in English | MEDLINE | ID: mdl-34536538

ABSTRACT

Multi-echo T2 magnetic resonance images contain information about the distribution of T2 relaxation times of compartmentalized water, from which we can estimate relevant brain tissue properties such as the myelin water fraction (MWF). Regularized non-negative least squares (NNLS) is the tool of choice for estimating non-parametric T2 spectra. However, the estimation is ill-conditioned, sensitive to noise, and highly affected by the employed regularization weight. The purpose of this study is threefold: first, we want to underline that the apparently innocuous use of two alternative parameterizations for solving the inverse problem, which we called the standard and alternative regularization forms, leads to different solutions; second, to assess the performance of both parameterizations; and third, to propose a new Bayesian regularized NNLS method (BayesReg). The performance of BayesReg was compared with that of two conventional approaches (L-curve and Chi-square (X2) fitting) using both regularization forms. We generated a large dataset of synthetic data, acquired in vivo human brain data in healthy participants for conducting a scan-rescan analysis, and correlated the myelin content derived from histology with the MWF estimated from ex vivo data. Results from synthetic data indicate that BayesReg provides accurate MWF estimates, comparable to those from L-curve and X2, and with better overall stability across a wider signal-to-noise range. Notably, we obtained superior results by using the alternative regularization form. The correlations reported in this study are higher than those reported in previous studies employing the same ex vivo and histological data. In human brain data, the estimated maps from L-curve and BayesReg were more reproducible. However, the T2 spectra produced by BayesReg were less affected by over-smoothing than those from L-curve. These findings suggest that BayesReg is a good alternative for estimating T2 distributions and MWF maps.


Subject(s)
Brain/diagnostic imaging , Magnetic Resonance Imaging/methods , Bayes Theorem , Female , Histological Techniques , Humans , Least-Squares Analysis , Male , Myelin Sheath/metabolism , Water/metabolism , Young Adult
6.
Quant Imaging Med Surg ; 11(7): 3098-3119, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34249638

ABSTRACT

BACKGROUND: The use of rigid multi-exponential models (with a priori predefined numbers of components) is common practice for diffusion-weighted MRI (DWI) analysis of the kidney. This approach may not accurately reflect renal microstructure, as the data are forced to conform to the a priori assumptions of simplified models. This work examines the feasibility of less constrained, data-driven non-negative least squares (NNLS) continuum modelling for DWI of the kidney tubule system in simulations that include emulations of pathophysiological conditions. METHODS: Non-linear least squares (LS) fitting was used as reference for the simulations. For performance assessment, a threshold of 5% or 10% for the mean absolute percentage error (MAPE) of NNLS and LS results was used. As ground truth, a tri-exponential model using defined volume fractions and diffusion coefficients for each renal compartment (tubule system: Dtubules , ftubules ; renal tissue: Dtissue , ftissue ; renal blood: Dblood , fblood ;) was applied. The impact of: (I) signal-to-noise ratio (SNR) =40-1,000, (II) number of b-values (n=10-50), (III) diffusion weighting (b-rangesmall =0-800 up to b-rangelarge =0-2,180 s/mm2), and (IV) fixation of the diffusion coefficients Dtissue and Dblood was examined. NNLS was evaluated for baseline and pathophysiological conditions, namely increased tubular volume fraction (ITV) and renal fibrosis (10%: grade I, mild) and 30% (grade II, moderate). RESULTS: NNLS showed the same high degree of reliability as the non-linear LS. MAPE of the tubular volume fraction (ftubules ) decreased with increasing SNR. Increasing the number of b-values was beneficial for ftubules precision. Using the b-rangelarge led to a decrease in MAPE ftubules compared to b-rangesmall. The use of a medium b-value range of b=0-1,380 s/mm2 improved ftubules precision, and further bmax increases beyond this range yielded diminishing improvements. Fixing Dblood and Dtissue significantly reduced MAPE ftubules and provided near perfect distinction between baseline and ITV conditions. Without constraining the number of renal compartments in advance, NNLS was able to detect the (fourth) fibrotic compartment, to differentiate it from the other three diffusion components, and to distinguish between 10% vs. 30% fibrosis. CONCLUSIONS: This work demonstrates the feasibility of NNLS modelling for DWI of the kidney tubule system and shows its potential for examining diffusion compartments associated with renal pathophysiology including ITV fraction and different degrees of fibrosis.

7.
Med Image Anal ; 69: 101959, 2021 04.
Article in English | MEDLINE | ID: mdl-33581618

ABSTRACT

Multi-component T2 relaxometry allows probing tissue microstructure by assessing compartment-specific T2 relaxation times and water fractions, including the myelin water fraction. Non-negative least squares (NNLS) with zero-order Tikhonov regularization is the conventional method for estimating smooth T2 distributions. Despite the improved estimation provided by this method compared to non-regularized NNLS, the solution is still sensitive to the underlying noise and the regularization weight. This is especially relevant for clinically achievable signal-to-noise ratios. In the literature of inverse problems, various well-established approaches to promote smooth solutions, including first-order and second-order Tikhonov regularization, and different criteria for estimating the regularization weight have been proposed, such as L-curve, Generalized Cross-Validation, and Chi-square residual fitting. However, quantitative comparisons between the available reconstruction methods for computing the T2 distribution, and between different approaches for selecting the optimal regularization weight, are lacking. In this study, we implemented and evaluated ten reconstruction algorithms, resulting from the individual combinations of three penalty terms with three criteria to estimate the regularization weight, plus non-regularized NNLS. Their performance was evaluated both in simulated data and real brain MRI data acquired from healthy volunteers through a scan-rescan repeatability analysis. Our findings demonstrate the need for regularization. As a result of this work, we provide a list of recommendations for selecting the optimal reconstruction algorithms based on the acquired data. Moreover, the implemented methods were packaged in a freely distributed toolbox to promote reproducible research, and to facilitate further research and the use of this promising quantitative technique in clinical practice.


Subject(s)
Myelin Sheath , Water , Algorithms , Humans , Magnetic Resonance Imaging , Signal-To-Noise Ratio
8.
NMR Biomed ; 33(12): e4277, 2020 12.
Article in English | MEDLINE | ID: mdl-32124505

ABSTRACT

Multi-compartment T2 mapping has gained particular relevance for the study of myelin water in the brain. As a facilitator of rapid saltatory axonal signal transmission, myelin is a cornerstone indicator of white matter development and function. Regularized non-negative least squares fitting of multi-echo T2 data has been widely employed for the computation of the myelin water fraction (MWF), and the obtained MWF maps have been histopathologically validated. MWF measurements depend upon the quality of the data acquisition, B1+ homogeneity and a range of fitting parameters. In this special issue article, we discuss the relevance of these factors for the accurate computation of multi-compartment T2 and MWF maps. We generated multi-echo spin-echo T2 decay curves following the Carr-Purcell-Meiboom-Gill approach for various myelin concentrations and myelin T2 scenarios by simulating the evolution of the magnetization vector between echoes based on the Bloch equations. We demonstrated that noise and imperfect refocusing flip angles yield systematic underestimations in MWF and intra-/extracellular water geometric mean T2 (gmT2 ). MWF estimates were more stable than myelin water gmT2 time across different settings of the T2 analysis. We observed that the lower limit of the T2 distribution grid should be slightly shorter than TE1 . Both TE1 and the acquisition echo spacing also have to be sufficiently short to capture the rapidly decaying myelin water T2 signal. Among all parameters of interest, the estimated MWF and intra-/extracellular water gmT2 differed by approximately 0.13-4 percentage points and 3-4 ms, respectively, from the true values, with larger deviations observed in the presence of greater B1+ inhomogeneities and at lower signal-to-noise ratio. Tailoring acquisition strategies may allow us to better characterize the T2 distribution, including the myelin water, in vivo.


Subject(s)
Computer Simulation , Magnetic Resonance Imaging , Myelin Sheath/physiology , Spin Labels , Female , Humans , Least-Squares Analysis , Signal-To-Noise Ratio , Water , Young Adult
9.
Neuroimage ; 184: 140-160, 2019 01 01.
Article in English | MEDLINE | ID: mdl-30193974

ABSTRACT

Spherical deconvolution methods are widely used to estimate the brain's white-matter fiber orientations from diffusion MRI data. In this study, eight spherical deconvolution algorithms were implemented and evaluated. These included two model selection techniques based on the extended Bayesian information criterion (i.e., best subset selection and the least absolute shrinkage and selection operator), iteratively reweighted l2- and l1-norm approaches to approximate the l0-norm, sparse Bayesian learning, Cauchy deconvolution, and two accelerated Richardson-Lucy algorithms. Results from our exhaustive evaluation show that there is no single optimal method for all different fiber configurations, suggesting that further studies should be conducted to find the optimal way of combining solutions from different methods. We found l0-norm regularization algorithms to resolve more accurately fiber crossings with small inter-fiber angles. However, in voxels with very dominant fibers, algorithms promoting more sparsity are less accurate in detecting smaller fibers. In most cases, the best algorithm to reconstruct fiber crossings with two fibers did not perform optimally in voxels with one or three fibers. Therefore, simplified validation systems as employed in a number of previous studies, where only two fibers with similar volume fractions were tested, should be avoided as they provide incomplete information. Future studies proposing new reconstruction methods based on high angular resolution diffusion imaging data should validate their results by considering, at least, voxels with one, two, and three fibers, as well as voxels with dominant fibers and different diffusion anisotropies.


Subject(s)
Algorithms , Brain/anatomy & histology , Diffusion Magnetic Resonance Imaging/methods , White Matter/anatomy & histology , Bayes Theorem , Diffusion Tensor Imaging/methods , Humans , Image Processing, Computer-Assisted/methods , Reproducibility of Results , Signal Processing, Computer-Assisted , Surveys and Questionnaires
10.
IEEE Trans Signal Process ; 66(12): 3124-3139, 2018 Jun 15.
Article in English | MEDLINE | ID: mdl-34188433

ABSTRACT

In this paper, we develop a Bayesian evidence maximization framework to solve the sparse non-negative least squares problem (S-NNLS). We introduce a family of probability densities referred to as the Rectified Gaussian Scale Mixture (R-GSM), to model the sparsity enforcing prior distribution for the signal of interest. The R-GSM prior encompasses a variety of heavy-tailed distributions such as the rectified Laplacian and rectified Student-t distributions with a proper choice of the mixing density. We utilize the hierarchical representation induced by the R-GSM prior and develop an evidence maximization framework based on the Expectation-Maximization (EM) algorithm. Using the EM-based method, we estimate the hyper-parameters and obtain a point estimate for the solution of interest. We refer to this proposed method as rectified Sparse Bayesian Learning (R-SBL). We provide four EM-based R-SBL variants that offer a range of options to trade-off computational complexity to the quality of the E-step computation. These methods include the Markov Chain Monte Carlo EM, linear minimum mean square estimation, approximate message passing and a diagonal approximation. Using numerical experiments, we show that the proposed R-SBL method outperforms existing S-NNLS solvers in terms of both signal and support recovery, and is very robust against the structure of the design matrix.

11.
Anal Chim Acta ; 937: 11-20, 2016 Sep 21.
Article in English | MEDLINE | ID: mdl-27590540

ABSTRACT

This article presents a wavelength selection framework for mixture identification problems. In contrast with multivariate calibration, where the mixture constituents are known and the goal is to estimate their concentration, in mixture identification the goal is to determine which of a large number of chemicals is present. Due to the combinatorial nature of this problem, traditional wavelength selection algorithms are unsuitable because the optimal set of wavelengths is mixture dependent. To address this issue, our framework interleaves wavelength selection with the sensing process, such that each subsequent wavelength is determined on-the-fly based on previous measurements. To avoid early convergence, our approach starts with an exploratory criterion that samples the spectrum broadly, then switches to an exploitative criterion that selects increasingly more relevant wavelengths as the solution approaches the true constituents of the mixture. We compare this "active" wavelength selection algorithm against a state-of-the-art passive algorithm (successive projection algorithm), both experimentally using a tunable spectrometer and in simulation using a large spectral library of chemicals. Our results show that our active method can converge to the true solution more frequently and with fewer measurements than the passive algorithm. The active method also leads to more compact solutions with fewer false positives.

12.
J Chromatogr A ; 1370: 179-86, 2014 Nov 28.
Article in English | MEDLINE | ID: mdl-25454143

ABSTRACT

In this work, a novel strategy based on chromatographic fingerprints and some chemometric techniques is proposed for quantitative analysis of the formulated complex system. Here, the formulated complex system means a formulated type of complicated analytical system containing more than one kind of raw material under some concentration composition according to a certain formula. The strategy is elaborated by an example of quantitative determination of mixtures consist of three essential oils. Three key steps of the strategy are as follows: (1) remove baselines of the chromatograms; (2) align retention time; (3) conduct quantitative analysis using multivariate regression with entire chromatographic profiles. Through the determination of concentration compositions of nine mixtures arranged by uniform design, the feasibility of the proposed strategy is validated and the factors that influence the quantitative result are also discussed. This strategy is proved to be viable and the validation indicates that quantitative result obtained using this strategy mainly depends on the efficiency of the alignment method as well as chromatographic peak shape of the chromatograms. Previously, chromatographic fingerprints were only used for identification and/or recognition of some products. This work demonstrates that with the assistance of some effective chemometric techniques, chromatographic fingerprints are also potential and promising in solving quantitative problems of complex analytical systems.


Subject(s)
Gas Chromatography-Mass Spectrometry/methods , Feasibility Studies , Oils, Volatile/analysis
13.
Genomics & Informatics ; : 33-39, 2006.
Article in English | WPRIM (Western Pacific) | ID: wpr-109761

ABSTRACT

Alternative splicing (AS) is an important mechanism of producing transcriptome diversity and microarray techniques are being used increasingly to monitor the splice variants. There exist three types of microarrays interrogating AS events-junction, exon, and tiling arrays. Junction probes have the advantage of monitoring the splice site directly. Johnson et al., performed a genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays (Science 302:2141-2144, 2003), which monitored splicing at every known exon-exon junctions for more than 10,000 multi-exon human genes in 52 tissues and cell lines. Here, we describe an algorithm to deduce the relative concentration of isoforms from the junction array data. Non-negative Matrix Factorization (NMF) is applied to obtain the transcript structure inferred from the expression data. Then we choose the transcript models consistent with the ECgene model of alternative splicing which is based on mRNA and EST alignment. The probe-transcript matrix is constructed using the NMF-consistent ECgene transcripts, and the isoform abundance is deduced from the non-negative least squares (NNLS) fitting of experimental data. Our method can be easily extended to other types of microarrays with exon or junction probes.


Subject(s)
Humans , Alternative Splicing , Cell Line , Exons , Least-Squares Analysis , Protein Isoforms , RNA Precursors , RNA, Messenger , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL
...