RESUMO
The significant heterogeneity of Wilms' tumors between different patients is thought to arise from genetic and epigenetic distortions that occur during various stages of fetal kidney development in a way that is poorly understood. To address this, we characterized the heterogeneity of alternative mRNA splicing in Wilms' tumors using a publicly available RNAseq dataset of high-risk Wilms' tumors and normal kidney samples. Through Pareto task inference and cell deconvolution, we found that the tumors and normal kidney samples are organized according to progressive stages of kidney development within a triangle-shaped region in latent space, whose vertices, or "archetypes", resemble the cap mesenchyme, the nephrogenic stroma, and epithelial tubular structures of the fetal kidney. We identified a set of genes that are alternatively spliced between tumors located in different regions of latent space and found that many of these genes are associated with the epithelial-to-mesenchymal transition (EMT) and muscle development. Using motif enrichment analysis, we identified putative splicing regulators, some of which are associated with kidney development. Our findings provide new insights into the etiology of Wilms' tumors and suggest that specific splicing mechanisms in early stages of development may contribute to tumor development in different patients.
Assuntos
Processamento Alternativo , Transição Epitelial-Mesenquimal , Neoplasias Renais , Tumor de Wilms , Tumor de Wilms/genética , Tumor de Wilms/patologia , Humanos , Neoplasias Renais/genética , Neoplasias Renais/patologia , Transição Epitelial-Mesenquimal/genética , Regulação Neoplásica da Expressão Gênica , Rim/metabolismo , Rim/patologiaRESUMO
Wilms' tumors are pediatric malignancies that are thought to arise from faulty kidney development. They contain a wide range of poorly differentiated cell states resembling various distorted developmental stages of the fetal kidney, and as a result, differ between patients in a continuous manner that is not well understood. Here, we used three computational approaches to characterize this continuous heterogeneity in high-risk blastemal-type Wilms' tumors. Using Pareto task inference, we show that the tumors form a triangle-shaped continuum in latent space that is bounded by three tumor archetypes with "stromal", "blastemal", and "epithelial" characteristics, which resemble the un-induced mesenchyme, the cap mesenchyme, and early epithelial structures of the fetal kidney. By fitting a generative probabilistic "grade of membership" model, we show that each tumor can be represented as a unique mixture of three hidden "topics" with blastemal, stromal, and epithelial characteristics. Likewise, cellular deconvolution allows us to represent each tumor in the continuum as a unique combination of fetal kidney-like cell states. These results highlight the relationship between Wilms' tumors and kidney development, and we anticipate that they will pave the way for more quantitative strategies for tumor stratification and classification.
Assuntos
Neoplasias Renais , Tumor de Wilms , Criança , Humanos , Neoplasias Renais/patologia , Aprendizado de Máquina não Supervisionado , Rim/patologiaRESUMO
Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the task of tracking the input is a series of speech frames, and the goal is to track the trajectory of the formant frequencies throughout the signal. The use of supervised machine learning techniques trained on an annotated corpus of read-speech for these tasks is proposed. Two deep network architectures were evaluated for estimation: feed-forward multilayer-perceptrons and convolutional neural-networks and, correspondingly, two architectures for tracking: recurrent and convolutional recurrent networks. The inputs to the former are composed of linear predictive coding-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the inputs to the latter are raw spectrograms. The performance of the methods compares favorably with alternative methods for formant estimation and tracking. A network architecture is further proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time. The adapted networks were evaluated on three datasets, and their performance was further improved.
RESUMO
A classification model is calibrated if its predicted probabilities of outcomes reflect their accuracy. Calibrating neural networks is critical in medical analysis applications where clinical decisions rely upon the predicted probabilities. Most calibration procedures, such as temperature scaling, operate as a post processing step by using holdout validation data. In practice, it is difficult to collect medical image data with correct labels due to the complexity of the medical data and the considerable variability across experts. This study presents a network calibration procedure that is robust to label noise. We draw on the fact that the confusion matrix of the noisy labels can be expressed as the matrix product between the confusion matrix of the clean labels and the label noises. The method is based on estimating the noise level as part of a noise-robust training method. The noise level is then used to estimate the network accuracy required by the calibration procedure. We show that despite the unreliable labels, we can still achieve calibration results that are on a par with the results of a calibration procedure using data with reliable labels.
Assuntos
Processamento de Imagem Assistida por Computador , Calibragem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Algoritmos , Diagnóstico por Imagem/métodosRESUMO
We present the Atlas of Classifiers (AoC)-a conceptually novel framework for brain MRI segmentation. The AoC is a spatial map of voxel-wise multinomial logistic regression (LR) functions learned from the labeled data. Upon convergence, the resulting fixed LR weights, a few for each voxel, represent the training dataset. It can, therefore, be considered as a light-weight learning machine, which despite its low capacity does not underfit the problem. The AoC construction is independent of the actual intensities of the test images, providing the flexibility to train it on the available labeled data and use it for the segmentation of images from different datasets and modalities. In this sense, it does not overfit the training data, as well. The proposed method has been applied to numerous publicly available datasets for the segmentation of brain MRI tissues and is shown to be robust to noise and outreach commonly used methods. Promising results were also obtained for multi-modal, cross-modality MRI segmentation. Finally, we show how AoC trained on brain MRIs of healthy subjects can be exploited for lesion segmentation of multiple sclerosis patients.
Assuntos
Encéfalo , Esclerose Múltipla , Encéfalo/diagnóstico por imagem , Humanos , Aprendizado de Máquina , Imageamento por Ressonância Magnética , NeuroimagemRESUMO
In this study we treat scribbling motion as a compositional system in which a limited set of elementary strokes are capable of concatenating amongst themselves in an endless number of combinations, thus producing an unlimited repertoire of complex constructs. We broke the continuous scribblings into small units and then calculated the Markovian transition matrix between the trajectory clusters. The Markov states are grouped in a way that minimizes the loss of mutual information between adjacent strokes. The grouping algorithm is based on a novel markov-state bi-clustering algorithm derived from the Information-Bottleneck principle. This approach hierarchically decomposes scribblings into increasingly finer elements. We illustrate the usefulness of this approach by applying it to human scribbling.
Assuntos
Análise por Conglomerados , Cadeias de Markov , Modelos Biológicos , Movimento/fisiologia , Algoritmos , Humanos , Acidente Vascular Cerebral/fisiopatologiaRESUMO
Recently, there has been increasing interest to leverage the competence of neural networks to analyze data. In particular, new clustering methods that employ deep embeddings have been presented. In this paper, we depart from centroid-based models and suggest a new framework, called Clustering-driven deep embedding with PAirwise Constraints (CPAC), for nonparametric clustering using a neural network. We present a clustering-driven embedding based on a Siamese network that encourages pairs of data points to output similar representations in the latent space. Our pair-based model allows augmenting the information with labeled pairs to constitute a semi-supervised framework. Our approach is based on analyzing the losses associated with each pair to refine the set of constraints. We show that clustering performance increases when using this scheme, even with a limited amount of user queries. We demonstrate how our architecture is adapted for various types of data and present the first deep framework to cluster three-dimensional (3-D) shapes.
RESUMO
Mixture of Gaussians (MoG) model is a useful tool in statistical learning. In many learning processes that are based on mixture models, computational requirements are very demanding due to the large number of components involved in the model. We propose a novel algorithm for learning a simplified representation of a Gaussian mixture, that is based on the Unscented Transform which was introduced for filtering nonlinear dynamical systems. The superiority of the proposed method is validated on both simulation experiments and categorization of a real image database. The proposed categorization methodology is based on modeling each image using a Gaussian mixture model. A category model is obtained by learning a simplified mixture model from all the images in the category.
Assuntos
Algoritmos , Inteligência Artificial , Interpretação Estatística de Dados , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Distribuição NormalRESUMO
Many human introns carry out a function, in the sense that they are critical to maintain normal cellular activity. Their identification is fundamental to understanding cellular processes and disease. However, being noncoding elements, such functional introns are poorly predicted based on traditional approaches of sequence and structure conservation. Here, we generated a dataset of human functional introns that carry out different types of functions. We showed that functional introns share common characteristics, such as higher positional conservation along the coding sequence and reduced loss rates, regardless of their specific function. A unique property of the data is that if an intron is unknown to be functional, it still does not mean that it is indeed non-functional. We developed a probabilistic framework that explicitly accounts for this unique property, and predicts which specific human introns are functional. We show that we successfully predict function even when the algorithm is trained on introns with a different type of function. This ability has many implications in studying regulatory networks, gene regulation, the effect of mutations outside exons on human disease, and on our general understanding of intron evolution and their functional exaptation in mammals.
Assuntos
Sequência Conservada/genética , Íntrons/genética , Animais , Sequência de Bases , Bases de Dados Genéticas , Análise Discriminante , Genoma , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Modelos Estatísticos , Fases de Leitura Aberta , Processamento Pós-Transcricional do RNA , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA Nucleolar Pequeno/genética , RNA Nucleolar Pequeno/metabolismoRESUMO
OBJECTIVE: We present a novel variant of the bag-of-visual-words (BoVW) method for automated medical image classification. METHODS: Our approach improves the BoVW model by learning a task-driven dictionary of the most relevant visual words per task using a mutual information-based criterion. Additionally, we generate relevance maps to visualize and localize the decision of the automatic classification algorithm. These maps demonstrate how the algorithm works and show the spatial layout of the most relevant words. RESULTS: We applied our algorithm to three different tasks: chest x-ray pathology identification (of four pathologies: cardiomegaly, enlarged mediastinum, right consolidation, and left consolidation), liver lesion classification into four categories in computed tomography (CT) images and benign/malignant clusters of microcalcifications (MCs) classification in breast mammograms. Validation was conducted on three datasets: 443 chest x-rays, 118 portal phase CT images of liver lesions, and 260 mammography MCs. The proposed method improves the classical BoVW method for all tested applications. For chest x-ray, area under curve of 0.876 was obtained for enlarged mediastinum identification compared to 0.855 using classical BoVW (with p-value 0.01). For MC classification, a significant improvement of 4% was achieved using our new approach (with p-value = 0.03). For liver lesion classification, an improvement of 6% in sensitivity and 2% in specificity were obtained (with p-value 0.001). CONCLUSION: We demonstrated that classification based on informative selected set of words results in significant improvement. SIGNIFICANCE: Our new BoVW approach shows promising results in clinically important domains. Additionally, it can discover relevant parts of images for the task at hand without explicit annotations for training data. This can provide computer-aided support for medical experts in challenging image analysis tasks.
Assuntos
Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia Torácica/métodos , Tomografia Computadorizada por Raios X/métodos , Dicionários como Assunto , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Técnica de SubtraçãoRESUMO
An automated algorithm for tissue segmentation of noisy, low-contrast magnetic resonance (MR) images of the brain is presented. A mixture model composed of a large number of Gaussians is used to represent the brain image. Each tissue is represented by a large number of Gaussian components to capture the complex tissue spatial layout. The intensity of a tissue is considered a global feature and is incorporated into the model through tying of all the related Gaussian parameters. The expectation-maximization (EM) algorithm is utilized to learn the parameter-tied, constrained Gaussian mixture model. An elaborate initialization scheme is suggested to link the set of Gaussians per tissue type, such that each Gaussian in the set has similar intensity characteristics with minimal overlapping spatial supports. Segmentation of the brain image is achieved by the affiliation of each voxel to the component of the model that maximized the a posteriori probability. The presented algorithm is used to segment three-dimensional, T1-weighted, simulated and real MR images of the brain into three different tissues, under varying noise conditions. Results are compared with state-of-the-art algorithms in the literature. The algorithm does not use an atlas for initialization or parameter learning. Registration processes are therefore not required and the applicability of the framework can be extended to diseased brains and neonatal brains.
Assuntos
Algoritmos , Inteligência Artificial , Encéfalo/anatomia & histologia , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Reconhecimento Automatizado de Padrão/métodos , Humanos , Armazenamento e Recuperação da Informação/métodos , Modelos Neurológicos , Modelos Estatísticos , Distribuição Normal , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
We describe an algorithm for context-based segmentation of visual data. New frames in an image sequence (video) are segmented based on the prior segmentation of earlier frames in the sequence. The segmentation is performed by adapting a probabilistic model learned on previous frames, according to the content of the new frame. We utilize the maximum a posteriori version of the EM algorithm to segment the new image. The Gaussian mixture distribution that is used to model the current frame is transformed into a conjugate-prior distribution for the parametric model describing the segmentation of the new frame. This semisupervised method improves the segmentation quality and consistency and enables a propagation of segments along the segmented images. The performance of the proposed approach is illustrated on both simulated and real image data.
Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Gravação em Vídeo/métodos , Armazenamento e Recuperação da Informação/métodos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
In this paper, we combine discrete and continuous image models with information-theoretic-based criteria for unsupervised hierarchical image-set clustering. The continuous image modeling is based on mixture of Gaussian densities. The unsupervised image-set clustering is based on a generalized version of a recently introduced information-theoretic principle, the information bottleneck principle. Images are clustered such that the mutual information between the clusters and the image content is maximally preserved. Experimental results demonstrate the performance of the proposed framework for image clustering on a large image set. Information theoretic tools are used to evaluate cluster quality. Particular emphasis is placed on the application of the clustering for efficient image search and retrieval.
Assuntos
Algoritmos , Inteligência Artificial , Análise por Conglomerados , Bases de Dados Factuais , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Teoria da InformaçãoRESUMO
In this paper, we apply efficient implementations of integer linear programming to the problem of image segmentation. The image is first grouped into superpixels and then local information is extracted for each pair of spatially adjacent superpixels. Given local scores on a map of several hundred superpixels, we use correlation clustering to find the global segmentation that is most consistent with the local evidence. We show that, although correlation clustering is known to be NP-hard, finding the exact global solution is still feasible by breaking the segmentation problem down into subproblems. Each such sub-problem can be viewed as an automatically detected image part. We can further accelerate the process by using the cutting-plane method, which provides a hierarchical structure of the segmentations. The efficiency and improved performance of the proposed method is compared to several state-of-the-art methods and demonstrated on several standard segmentation data sets.
RESUMO
This paper presents an automatic lesion segmentation method based on similarities between multichannel patches. A patch database is built using training images for which the label maps are known. For each patch in the testing image, k similar patches are retrieved from the database. The matching labels for these k patches are then combined to produce an initial segmentation map for the test case. Finally an iterative patch-based label refinement process based on the initial segmentation map is performed to ensure the spatial consistency of the detected lesions. The method was evaluated in experiments on multiple sclerosis (MS) lesion segmentation in magnetic resonance images (MRI) of the brain. An evaluation was done for each image in the MICCAI 2008 MS lesion segmentation challenge. Results are shown to compete with the state of the art in the challenge. We conclude that the proposed algorithm for segmentation of lesions provides a promising new approach for local segmentation and global detection in medical images.
RESUMO
Classification of clustered breast microcalcifications into benign and malignant categories is an extremely challenging task for computerized algorithms and expert radiologists alike. In this paper we apply a multi-view-classifier for the task. We describe a two-step classification method that is based on a view-level decision, implemented by a logistic regression classifier, followed by a stochastic combination of the two view-level indications into a single benign or malignant decision. The proposed method was evaluated on a large number of cases from a standardized digital database for screening mammography (DDSM). Experimental results demonstrate the advantage of the proposed multi-view classification algorithm that automatically learns the best way to combine the views.
Assuntos
Neoplasias da Mama/diagnóstico por imagem , Calcinose/diagnóstico por imagem , Mamografia/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Algoritmos , Mama/diagnóstico por imagem , Feminino , HumanosRESUMO
In this paper, we describe a statistical video representation and modeling scheme. Video representation schemes are needed to segment a video stream into meaningful video-objects, useful for later indexing and retrieval applications. In the proposed methodology, unsupervised clustering via Gaussian mixture modeling extracts coherent space-time regions in feature space, and corresponding coherent segments (video-regions) in the video content. A key feature of the system is the analysis of video input as a single entity as opposed to a sequence of separate frames. Space and time are treated uniformly. The probabilistic space-time video representation scheme is extended to a piecewise GMM framework in which a succession of GMMs are extracted for the video sequence, instead of a single global model for the entire sequence. The piecewise GMM framework allows for the analysis of extended video sequences and the description of nonlinear, nonconvex motion patterns. The extracted space-time regions allow for the detection and recognition of video events. Results of segmenting video content into static versus dynamic video regions and video content editing are presented.
Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão , Gravação em Vídeo/métodos , Gráficos por Computador , Aumento da Imagem/métodos , Distribuição Normal , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador , Técnica de SubtraçãoRESUMO
We present a method for combining several segmentations of an image into a single one that in some sense is the average segmentation in order to achieve a more reliable and accurate segmentation result. The goal is to find a point in the "space of segmentations" which is close to all the individual segmentations. We present an algorithm for segmentation averaging. The image is first oversegmented into superpixels. Next, each segmentation is projected onto the superpixel map. An instance of the EM algorithm combined with integer linear programming is applied on the set of binary merging decisions of neighboring superpixels to obtain the average segmentation. Apart from segmentation averaging, the algorithm also reports the reliability of each segmentation. The performance of the proposed algorithm is demonstrated on manually annotated images from the Berkeley segmentation data set and on the results of automatic segmentation algorithms.
RESUMO
In this study we present an efficient image categorization system for medical image databases utilizing a local patch representation based on both content and location. The system discriminates between healthy and pathological cases and indicates the subregion in the image that is automatically found to be most relevant for the decision. We show an application to pathology-level categorization of chest x-ray data, the most popular examination in radiology. Experimental results are provided on chest radiographs taken from routine hospital examinations.
Assuntos
Diagnóstico por Imagem/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia Torácica/métodos , Tórax/patologia , Algoritmos , Gráficos por Computador , Diagnóstico por Computador , Humanos , Idioma , Modelos Estatísticos , Curva ROC , Radiologia/métodos , Sensibilidade e Especificidade , Raios XRESUMO
In this study we present an efficient image categorization and retrieval system applied to medical image databases, in particular large radiograph archives. The methodology is based on local patch representation of the image content, using a "bag of visual words" approach. We explore the effects of various parameters on system performance, and show best results using dense sampling of simple features with spatial content, and a nonlinear kernel-based support vector machine (SVM) classifier. In a recent international competition the system was ranked first in discriminating orientation and body regions in X-ray images. In addition to organ-level discrimination, we show an application to pathology-level categorization of chest X-ray data, the most popular examination in radiology. The system discriminates between healthy and pathological cases, and is also shown to successfully identify specific pathologies in a set of chest radiographs taken from a routine hospital examination. This is a first step towards similarity-based categorization, which has a major clinical implications for computer-assisted diagnostics.