Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38587960

RESUMEN

The primary objective of imaging genetics research is to investigate the complex genotype-phenotype association for the disease under study. For example, to understand the impact of genetic variations over the brain functions and structure, the genotypic data such as single nucleotide polymorphism (SNP) is integrated with the phenotypic data such as imaging quantitative traits. The sparse models, based on canonical correlation analysis (CCA), are popular in this area to find the complex bi-multivariate genotype-phenotype association, as the number of features in genotypic and/or phenotypic data is significantly higher as compared to the number of samples. However, the sparse CCA based methods are, in general, unsupervised in nature, and fail to identify the diagnose-specific features those play an important role for the diagnosis and prognosis of the disease under study. In this regard, a new supervised model is proposed to study the complex genotype-phenotype association, by judiciously integrating the merits of CCA, linear discriminant analysis (LDA) and multi-task learning. The proposed model can identify the diagnose-specific as well as the diagnose-consistent features with significantly lower computational complexity. The performance of the proposed method, along with a comparison with the state-of-the-art methods, is evaluated on several synthetic data sets and one real imaging genetics data collected from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. In the current study, the SNP as genetic data and resting state functional MRI ( fMRI) as imaging data are integrated to find the complex genotype-phenotype association. An important finding is that the proposed method has better correlation value, improved noise resistance and stability, and also has better feature selection ability. All the results illustrate the power and capability of the proposed method to find the diagnostic group-specific imaging genetic association, which may help to understand the neurodegenerative disorder in a more comprehensive way.

2.
Artículo en Inglés | MEDLINE | ID: mdl-37267140

RESUMEN

Over the past few years, multimodal data analysis has emerged as an inevitable method for identifying sample categories. In the multi-view data classification problem, it is expected that the joint representation should include the supervised information of sample categories so that the similarity in the latent space implies the similarity in the corresponding concepts. Since each view has different statistical properties, the joint representation should be able to encapsulate the underlying nonlinear data distribution of the given observations. Another important aspect is the coherent knowledge of the multiple views. It is required that the learning objective of the multi-view model efficiently captures the nonlinear correlated structures across different modalities. In this context, this article introduces a novel architecture, termed discriminative deep canonical correlation analysis (D2CCA), for classifying given observations into multiple categories. The learning objective of the proposed architecture includes the merits of generative models to identify the underlying probability distribution of the given observations. In order to improve the discriminative ability of the proposed architecture, the supervised information is incorporated into the learning objective of the proposed model. It also enables the architecture to serve as both a feature extractor as well as a classifier. The theory of CCA is integrated with the objective function so that the joint representation of the multi-view data is learned from maximally correlated subspaces. The proposed framework is consolidated with corresponding convergence analysis. The efficacy of the proposed architecture is studied on different domains of applications, namely, object recognition, document classification, multilingual categorization, face recognition, and cancer subtype identification with reference to several state-of-the-art methods.

3.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2278-2290, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37027602

RESUMEN

Gene expression data sets and protein-protein interaction (PPI) networks are two heterogeneous data sources that have been extensively studied, due to their ability to capture the co-expression patterns among genes and their topological connections. Although they depict different traits of the data, both of them tend to group co-functional genes together. This phenomenon agrees with the basic assumption of multi-view kernel learning, according to which different views of the data contain a similar inherent cluster structure. Based on this inference, a new multi-view kernel learning based disease gene identification algorithm, termed as DiGId, is put forward. A novel multi-view kernel learning approach is proposed that aims to learn a consensus kernel, which efficiently captures the heterogeneous information of individual views as well as depicts the underlying inherent cluster structure. Some low-rank constraints are imposed on the learned multi-view kernel, so that it can effectively be partitioned into k or fewer clusters. The learned joint cluster structure is used to curate a set of potential disease genes. Moreover, a novel approach is put forward to quantify the importance of each view. In order to demonstrate the effectiveness of the proposed approach in capturing the relevant information depicted by individual views, an extensive analysis is performed on four different cancer-related gene expression data sets and PPI network, considering different similarity measures.


Asunto(s)
Algoritmos , Enfermedad , Mapas de Interacción de Proteínas , Análisis por Micromatrices , Perfilación de la Expresión Génica , Enfermedad/genética , Conjuntos de Datos como Asunto , Humanos
4.
IEEE Trans Med Imaging ; 42(6): 1746-1757, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37022026

RESUMEN

The variation in color appearance among the Hematoxylin and Eosin (H&E) stained histological images is one of the major problems, as the color disagreement may affect the computer aided diagnosis of histology slides. In this regard, the paper introduces a new deep generative model to reduce the color variation present among the histological images. The proposed model assumes that the latent color appearance information, extracted through a color appearance encoder, and stain bound information, extracted via stain density encoder, are independent of each other. In order to capture the disentangled color appearance and stain bound information, a generative module as well as a reconstructive module are considered in the proposed model to formulate the corresponding objective functions. The discriminator is modeled to discriminate between not only the image samples, but also the joint distributions corresponding to image samples, color appearance information and stain bound information, which are sampled individually from different source distributions. To deal with the overlapping nature of histochemical reagents, the proposed model assumes that the latent color appearance code is sampled from a mixture model. As the outer tails of a mixture model do not contribute adequately in handling overlapping information, rather are prone to outliers, a mixture of truncated normal distributions is used to deal with the overlapping nature of histochemical stains. The performance of the proposed model, along with a comparison with state-of-the-art approaches, is demonstrated on several publicly available data sets containing H&E stained histological images. An important finding is that the proposed model outperforms state-of-the-art methods in 91.67% and 69.05% cases, with respect to stain separation and color normalization, respectively.


Asunto(s)
Colorantes , Técnicas Histológicas , Color , Coloración y Etiquetado , Hematoxilina , Procesamiento de Imagen Asistido por Computador/métodos , Eosina Amarillenta-(YS)
5.
IEEE Trans Cybern ; 53(9): 5497-5509, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35417362

RESUMEN

One of the important issues associated with real-life high-dimensional data analysis is how to extract significant and relevant features from multiview data. The multiset canonical correlation analysis (MCCA) is a well-known statistical method for multiview data integration. It finds a linear subspace that maximizes the correlations among different views. However, the existing methods to find the multiset canonical variables are computationally very expensive, which restricts the application of the MCCA in real-life big data analysis. The covariance matrix of each high-dimensional view may also suffer from the singularity problem due to the limited number of samples. Moreover, the MCCA-based existing feature extraction algorithms are, in general, unsupervised in nature. In this regard, a new supervised feature extraction algorithm is proposed, which integrates multimodal multidimensional data sets by solving maximal correlation problem of the MCCA. A new block matrix representation is introduced to reduce the computational complexity for computing the canonical variables of the MCCA. The analytical formulation enables efficient computation of the multiset canonical variables under supervised ridge regression optimization technique. It deals with the "curse of dimensionality" problem associated with high-dimensional data and facilitates the sequential generation of relevant features with significantly lower computational cost. The effectiveness of the proposed multiblock data integration algorithm, along with a comparison with other existing methods, is demonstrated on several benchmark and real-life cancer data.

6.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1130-1143, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-32966220

RESUMEN

In the past few decades, both gene expression data and protein-protein interaction (PPI)networks have been extensively studied, due to their ability to depict important characteristics of disease-associated genes. In this regard, the paper presents a new gene prioritization algorithm to identify and prioritize cancer-causing genes, integrating judiciously the complementary information obtained from two data sources. The proposed algorithm selects disease-causing genes by maximizing the importance of selected genes and functional similarity among them. A new quantitative index is introduced to evaluate the importance of a gene. It considers whether a gene exhibits a differential expression pattern across sick and healthy individuals, and has a strong connectivity in the PPI network, which are the important characteristics of a potential biomarker. As disease-associated genes are expected to have similar expression profiles and topological structures, a scalable non-linear graph fusion technique, termed as ScaNGraF, is proposed to learn a disease-dependent functional similarity network from the co-expression and common neighbor based similarity networks. The proposed ScaNGraF, which is based on message passing algorithm, efficiently combines the shared and complementary information provided by different data sources with significantly lower computational cost. A new measure, termed as DiCoIN, is introduced to evaluate the quality of a learned affinity network. The performance of the proposed graph fusion technique and gene selection algorithm is extensively compared with that of some existing methods, using several cancer data sets.


Asunto(s)
Biología Computacional , Neoplasias , Algoritmos , Biología Computacional/métodos , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Mapas de Interacción de Proteínas/genética
7.
IEEE Trans Cybern ; 52(2): 947-959, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-32452799

RESUMEN

One of the major problems in cancer subtype discovery from multimodal omic data is that all the available modalities may not encode relevant and homogeneous information about the subtypes. Moreover, the high-dimensional nature of the modalities makes sample clustering computationally expensive. In this regard, a novel algorithm is proposed to extract a low-rank joint subspace of the integrated data matrix. The proposed algorithm first evaluates the quality of subtype information provided by each of the modalities, and then judiciously selects only relevant ones to construct the joint subspace. The problem of incrementally updating the singular value decomposition of a data matrix is formulated for the multimodal data framework. The analytical formulation enables efficient construction of the joint subspace of integrated data from low-rank subspaces of the individual modalities. The construction of joint subspace by the proposed method is shown to be computationally more efficient compared to performing the principal component analysis (PCA) on the integrated data matrix. Some new quantitative indices are introduced to measure theoretically the accuracy of subspace construction by the proposed approach with respect to the principal subspace extracted by the PCA. The efficacy of clustering on the joint subspace constructed by the proposed algorithm is established over existing integrative clustering approaches on several real-life multimodal cancer data sets.


Asunto(s)
Algoritmos , Neoplasias , Análisis por Conglomerados , Humanos , Neoplasias/epidemiología
8.
IEEE Trans Neural Netw Learn Syst ; 33(8): 3895-3907, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-33606638

RESUMEN

The meaningful patterns embedded in high-dimensional multi-view data sets typically tend to have a much more compact representation that often lies close to a low-dimensional manifold. Identification of hidden structures in such data mainly depends on the proper modeling of the geometry of low-dimensional manifolds. In this regard, this article presents a manifold optimization-based integrative clustering algorithm for multi-view data. To identify consensus clusters, the algorithm constructs a joint graph Laplacian that contains denoised cluster information of the individual views. It optimizes a joint clustering objective while reducing the disagreement between the cluster structures conveyed by the joint and individual views. The optimization is performed alternatively over k -means and Stiefel manifolds. The Stiefel manifold helps to model the nonlinearities and differential clusters within the individual views, whereas k -means manifold tries to elucidate the best-fit joint cluster structure of the data. A gradient-based movement is performed separately on the manifold of each view so that individual nonlinearity is preserved while looking for shared cluster information. The convergence of the proposed algorithm is established over the manifold and asymptotic convergence bound is obtained to quantify theoretically how fast the sequence of iterates generated by the algorithm converges to an optimal solution. The integrative clustering on benchmark and multi-omics cancer data sets demonstrates that the proposed algorithm outperforms state-of-the-art multi-view clustering approaches.

9.
PLoS One ; 16(6): e0250964, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34138852

RESUMEN

Brain tumor is not most common, but truculent type of cancer. Therefore, correct prediction of its aggressiveness nature at an early stage would influence the treatment strategy. Although several diagnostic methods based on different modalities exist, a pre-operative method for determining tumor malignancy state still remains as an active research area. In this regard, the paper presents a new method for the assessment of tumor grades using conventional MR sequences namely, T1, T1 with contrast enhancement, T2 and FLAIR. The proposed method for tumor gradation is mainly based on feature extraction using multiresolution image analysis and classification using support vector machine. Since the wavelet features of different tumor subregions, obtained from single MR sequence, do not carry equally important information, a wavelet fusion technique is proposed based on the texture information content of each voxel. The concept of texture gradient, used in the proposed algorithm, fuses the wavelet coefficients of the given MR sequences. The feature vector is then derived from the co-occurrence of fused wavelet coefficients. As each wavelet subband contains distinct detail information, a novel concept of multispectral co-occurrence of wavelet coefficients is introduced to capture the spatial correlation among different subbands. It enables to convey more informative features to characterize the tumor type. The effectiveness of the proposed method is analyzed, with respect to six classification performance indices, on BRATS 2012 and BRATS 2014 data sets. The classification accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under curve assessed by the ten-fold cross-validation are 91.3%, 96.8%, 66.7%, 92.4%, 88.4%, and 92.0%, respectively, on real brain MR data.


Asunto(s)
Neoplasias Encefálicas/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Humanos , Aprendizaje Automático , Análisis de Ondículas
10.
Front Genet ; 12: 637362, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33664772

RESUMEN

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), the causative agent of coronavirus induced disease-2019 (COVID-19), is a type of common cold virus responsible for a global pandemic which requires immediate measures for its containment. India has the world's largest population aged between 10 and 40 years. At the same time, India has a large number of individuals with diabetes, hypertension and kidney diseases, who are at a high risk of developing COVID-19. A vaccine against the SARS-CoV-2, may offer immediate protection from the causative agent of COVID-19, however, the protective memory may be short-lived. Even if vaccination is broadly successful in the world, India has a large and diverse population with over one-third being below the poverty line. Therefore, the success of a vaccine, even when one becomes available, is uncertain, making it necessary to focus on alternate approaches of tackling the disease. In this review, we discuss the differences in COVID-19 death/infection ratio between urban and rural India; and the probable role of the immune system, co-morbidities and associated nutritional status in dictating the death rate of COVID-19 patients in rural and urban India. Also, we focus on strategies for developing masks, vaccines, diagnostics and the role of drugs targeting host-virus protein-protein interactions in enhancing host immunity. We also discuss India's strengths including the resources of medicinal plants, good food habits and the role of information technology in combating COVID-19. We focus on the Government of India's measures and strategies for creating awareness in the containment of COVID-19 infection across the country.

11.
IEEE Trans Pattern Anal Mach Intell ; 43(3): 798-813, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-31603770

RESUMEN

One of the important approaches of handling data heterogeneity in multimodal data clustering is modeling each modality using a separate similarity graph. Information from the multiple graphs is integrated by combining them into a unified graph. A major challenge here is how to preserve cluster information while removing noise from individual graphs. In this regard, a novel algorithm, termed as CoALa, is proposed that integrates noise-free approximations of multiple similarity graphs. The proposed method first approximates a graph using the most informative eigenpairs of its Laplacian which contain cluster information. The approximate Laplacians are then integrated for the construction of a low-rank subspace that best preserves overall cluster information of multiple graphs. However, this approximate subspace differs from the full-rank subspace which integrates information from all the eigenpairs of each Laplacian. Matrix perturbation theory is used to theoretically evaluate how far approximate subspace deviates from the full-rank one for a given value of approximation rank. Finally, spectral clustering is performed on the approximate subspace to identify the clusters. Experimental results on several real-life cancer and benchmark data sets demonstrate that the proposed algorithm significantly and consistently outperforms state-of-the-art integrative clustering approaches.

12.
IEEE Trans Cybern ; 51(7): 3641-3652, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31329144

RESUMEN

One of the important issues in pattern recognition and machine learning is how to find natural groups present in a dataset. In this regard, this paper presents a novel clustering algorithm, called rough hypercuboid-based interval type-2 fuzzy c -means (RIT2FCM). It judiciously integrates the merits of the rough hypercuboid approach, c -means algorithm, and interval type-2 fuzzy set, to address the uncertainty associated with real-life datasets. Using the concept of hypercuboid equivalence partition matrix (HEM) of rough hypercuboid approach, the lower approximation and boundary region of each cluster are implicitly defined, without using any prespecified threshold parameter. The interval-valued fuzzifier is applied to address the uncertainty coupled with different parameters of rough-fuzzy clustering algorithms, where the determination of the appropriate value of fuzzifier is a difficult task. An analytical formulation on the convergence analysis of the proposed RIT2FCM algorithm, along with a theoretical bound of its fuzzifier, is also introduced. The efficacy of the proposed RIT2FCM method is extensively compared with that of several existing clustering algorithms, using some cluster validity and classification rate indices on various real-life datasets. The proposed algorithm performs better than the state-of-the-art c -means algorithms in 92.59% cases, with respect to different cluster validity indices, in lesser computation time.

13.
Artículo en Inglés | MEDLINE | ID: mdl-32142431

RESUMEN

In general, the hidden Markov random field (HMRF) represents the class label distribution of an image in probabilistic model based segmentation. The class label distributions provided by existing HMRF models consider either the number of neighboring pixels with similar class labels or the spatial distance of neighboring pixels with dissimilar class labels. Also, this spatial information is only considered for estimation of class labels of the image pixels, while its contribution in parameter estimation is completely ignored. This, in turn, deteriorates the parameter estimation, resulting in sub-optimal segmentation performance. Moreover, the existing models assign equal weightage to the spatial information for class label estimation of all pixels throughout the image, which, create significant misclassification for the pixels in boundary region of image classes. In this regard, the paper develops a new clique potential function and a new class label distribution, incorporating the information of image class parameters. Unlike existing HMRF model based segmentation techniques, the proposed framework introduces a new scaling parameter that adaptively measures the contribution of spatial information for class label estimation of image pixels. The importance of the proposed framework is depicted by modifying the HMRF based segmentation methods. The advantage of proposed class label distribution is also demonstrated irrespective of the underlying intensity distributions. The comparative performance of the proposed and existing class label distributions in HMRF model is demonstrated both qualitatively and quantitatively for brain MR image segmentation, HEp-2 cell delineation, natural image and object segmentation.

14.
IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1290-1302, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-30676972

RESUMEN

Multimodal data integration is an important framework for cancer subtype discovery as it can blend the inherent properties of individual modalities with their cross-platform correlations to infer clinically relevant subtypes. The main problem here is the appropriate selection of relevant and complementary modalities. Another problem is the 'high dimension-low sample size' nature of each modality. The current research work proposes a novel algorithm to construct a low-rank joint subspace from the low-rank subspaces of individual high-dimensional modalities. Statistical hypothesis testing is introduced to effectively estimate the rank of each modality by separating the signal component from its noise counterpart. Two quantitative indices are proposed to evaluate the quality of different modalities, the first one assesses the degree of relevance of the cluster structure embedded within each modality, while the second measure evaluates the amount of cluster information shared between two modalities. To construct the joint subspace, the algorithm selects the most relevant modalities with maximum shared information. During data integration, the intersection between two subspaces is also considered to select cluster information and filter out the noise from different subspaces. The efficacy of clustering on the joint subspace, extracted by the proposed algorithm, is compared with that of several existing integrative clustering approaches on real-life multimodal data sets. Experimental results show that the identified subtypes have closer resemblance with the clinically established subtypes as compared to the subtypes identified by the existing approaches. Survival analysis has revealed the significant differences between survival profiles of the identified subtypes, while robustness analysis shows that the identified subtypes are not sensitive towards perturbation of the data sets.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Neoplasias , Metilación de ADN , Bases de Datos Genéticas , Humanos , Estimación de Kaplan-Meier , Neoplasias/clasificación , Neoplasias/epidemiología , Neoplasias/mortalidad , Transcriptoma
15.
IEEE Trans Med Imaging ; 39(5): 1735-1745, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-31796391

RESUMEN

One of the foremost and challenging tasks in hematoxylin and eosin stained histological image analysis is to reduce color variation present among images, which may significantly affect the performance of computer-aided histological image analysis. In this regard, the paper introduces a new rough-fuzzy circular clustering algorithm for stain color normalization. It judiciously integrates the merits of both fuzzy and rough sets. While the theory of rough sets deals with uncertainty, vagueness, and incompleteness in stain class definition, fuzzy set handles the overlapping nature of histochemical stains. The proposed circular clustering algorithm works on a weighted hue histogram, which considers both saturation and local neighborhood information of the given image. A new dissimilarity measure is introduced to deal with the circular nature of hue values. Some new quantitative measures are also proposed to evaluate the color constancy after normalization. The performance of the proposed method, along with a comparison with other state-of-the-art methods, is demonstrated on several publicly available standard data sets consisting of hematoxylin and eosin stained histological images.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Análisis por Conglomerados , Color , Eosina Amarillenta-(YS) , Lógica Difusa , Hematoxilina
16.
Magn Reson Imaging ; 54: 46-57, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30076947

RESUMEN

Segmentation of brain region from an MR volume is an essential prerequisite for any automatic medical image processing application as it increases both speed and accuracy of the diagnosis in manifold. Due to material heterogeneity and resolution limitation of imaging devices, the MR image introduces graded intensity of tissues within the brain region. Moreover, it incurs the blurring effect at the brain surface. In spite of these artifacts, all the tissues of brain region of an MR image are perceived to be hanged together within the brain. In this regard, this paper introduces an accurate and robust skull stripping algorithm, termed as ARoSi. It is based on a novel concept, called rough-fuzzy connectedness, introduced in this paper. In the proposed method, the connectedness of a voxel to the brain region is determined by its degree of belongingness to the brain region as well as the degree of adjacency to the brain. Moreover, the proposed ARoSi algorithm considers the local spatial information of the voxel of interest, which reduces the effect of noise, and in turn, helps to improve the performance of the proposed method. Finally, the performance of the proposed ARoSi algorithm, along with a comparison with other state-of-the-art algorithms, is demonstrated on T1-weighted 3-D brain MR volumes obtained from four different data sets. The experiments show that the performance of ARoSi is consistent across all the four data sets, including diseased data sets. The proposed algorithm achieves the highest mean Dice coefficient of value 0.951 for all the volumes of four different data sets, among six existing brain extraction methods.


Asunto(s)
Artefactos , Encéfalo/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Imagenología Tridimensional/métodos , Imagen por Resonancia Magnética/métodos , Esquizofrenia/fisiopatología , Algoritmos , Encéfalo/fisiopatología , Conjuntos de Datos como Asunto , Humanos , Reproducibilidad de los Resultados , Esquizofrenia/diagnóstico por imagen , Sensibilidad y Especificidad , Cráneo
17.
IEEE Trans Cybern ; 48(4): 1229-1241, 2018 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-28391216

RESUMEN

One of the main problems associated with high dimensional multimodal real life data sets is how to extract relevant and significant features. In this regard, a fast and robust feature extraction algorithm, termed as FaRoC, is proposed, integrating judiciously the merits of canonical correlation analysis (CCA) and rough sets. The proposed method extracts new features sequentially from two multidimensional data sets by maximizing their relevance with respect to class label and significance with respect to already-extracted features. To generate canonical variables sequentially, an analytical formulation is introduced to establish the relation between regularization parameters and CCA. The formulation enables the proposed method to extract required number of correlated features sequentially with lesser computational cost as compared to existing methods. To compute both significance and relevance measures of a feature, the concept of hypercuboid equivalence partition matrix of rough hypercuboid approach is used. It also provides an efficient way to find optimum regularization parameters employed in CCA. The efficacy of the proposed FaRoC algorithm, along with a comparison with other existing methods, is extensively established on several real life data sets.

18.
IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1419-1433, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28113633

RESUMEN

One of the most significant research issues in functional genomics is insilico identification of disease related genes. In this regard, the paper presents a new gene selection algorithm, termed as SiFS, for identification of disease genes. It integrates the information obtained from interaction network of proteins and gene expression profiles. The proposed SiFS algorithm culls out a subset of genes from microarray data as disease genes by maximizing both significance and functional similarity of the selected gene subset. Based on the gene expression profiles, the significance of a gene with respect to another gene is computed using mutual information. On the other hand, a new measure of similarity is introduced to compute the functional similarity between two genes. Information derived from the protein-protein interaction network forms the basis of the proposed SiFS algorithm. The performance of the proposed gene selection algorithm and new similarity measure, is compared with that of other related methods and similarity measures, using several cancer microarray data sets.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Predisposición Genética a la Enfermedad/genética , Mapas de Interacción de Proteínas/genética , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos
19.
IEEE Trans Biomed Eng ; 64(8): 1841-1851, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-27834637

RESUMEN

OBJECTIVE: This paper presents a novel supervised regularized canonical correlation analysis, termed as CuRSaR, to extract relevant and significant features from multimodal high dimensional omics datasets. METHODS: The proposed method extracts a new set of features from two multidimensional datasets by maximizing the relevance of extracted features with respect to sample categories and significance among them. It integrates judiciously the merits of regularized canonical correlation analysis (RCCA) and rough hypercuboid approach. An analytical formulation, based on spectral decomposition, is introduced to establish the relation between canonical correlation analysis (CCA) and RCCA. The concept of hypercuboid equivalence partition matrix of rough hypercuboid is used to compute both relevance and significance of a feature. SIGNIFICANCE: The analytical formulation makes the computational complexity of the proposed algorithm significantly lower than existing methods. The equivalence partition matrix offers an efficient way to find optimum regularization parameters employed in CCA. RESULTS: The superiority of the proposed algorithm over other existing methods, in terms of computational complexity and classification accuracy, is established extensively on real life data.


Asunto(s)
Biomarcadores de Tumor/genética , Minería de Datos/métodos , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad/genética , Neoplasias/genética , Proteoma/genética , Algoritmos , Metilación de ADN/genética , Perfilación de la Expresión Génica/métodos , Humanos , Mapeo de Interacción de Proteínas/métodos
20.
IEEE Trans Image Process ; 24(12): 5764-76, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26462197

RESUMEN

The segmentation of brain MR images into different tissue classes is an important task for automatic image analysis technique, particularly due to the presence of intensity inhomogeneity artifact in MR images. In this regard, this paper presents a novel approach for simultaneous segmentation and bias field correction in brain MR images. It integrates judiciously the concept of rough sets and the merit of a novel probability distribution, called stomped normal (SN) distribution. The intensity distribution of a tissue class is represented by SN distribution, where each tissue class consists of a crisp lower approximation and a probabilistic boundary region. The intensity distribution of brain MR image is modeled as a mixture of finite number of SN distributions and one uniform distribution. The proposed method incorporates both the expectation-maximization and hidden Markov random field frameworks to provide an accurate and robust segmentation. The performance of the proposed approach, along with a comparison with related methods, is demonstrated on a set of synthetic and real brain MR images for different bias fields and noise levels.


Asunto(s)
Encéfalo/anatomía & histología , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Algoritmos , Humanos , Cadenas de Markov , Distribución Normal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...