Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38598394

RESUMEN

Interactive semantic segmentation pursues high-quality segmentation results at the cost of a small number of user clicks. It is attracting more and more research attention for its convenience in labeling semantic pixel-level data. Existing interactive segmentation methods often pursue higher interaction efficiency by mining the latent information of user clicks or exploring efficient interaction manners. However, these works neglect to explicitly exploit the semantic correlations between user corrections and model mispredictions, thus suffering from two flaws. First, similar prediction errors frequently occur in actual use, causing users to repeatedly correct them. Second, the interaction difficulty of different semantic classes varies across images, but existing models use monotonic parameters for all images which lack semantic pertinence. Therefore, in this article, we explore the semantic correlations existing in corrections and mispredictions by proposing a simple yet effective online learning solution to the above problems, named correction-misprediction correlation mining ( CM2 ). Specifically, we leverage the correction-misprediction similarities to design a confusion memory module (CMM) for automatic correction when similar prediction errors reappear. Furthermore, we measure the semantic interaction difficulty by counting the correction-misprediction pairs and design a challenge adaptive convolutional layer (CACL), which can adaptively switch different parameters according to interaction difficulties to better segment the challenging classes. Our method requires no extra training besides the online learning process and can effectively improve interaction efficiency. Our proposed CM2 achieves state-of-the-art results on three public semantic segmentation benchmarks.

2.
IEEE Trans Image Process ; 33: 2090-2103, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38470590

RESUMEN

Existing approaches towards anomaly detection (AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we propose a novel methodology to address the challenge of FSAD which incorporates two important techniques. Firstly, we employ a model pre-trained on a large source dataset to initialize model weights. Secondly, to ameliorate the covariate shift between source and target domains, we adopt contrastive training to fine-tune on the few-shot target domain data. To learn suitable representations for the downstream AD task, we additionally incorporate cross-instance positive pairs to encourage a tight cluster of the normal samples, and negative pairs for better separation between normal and synthesized negative samples. We evaluate few-shot anomaly detection on 3 controlled AD tasks and 4 real-world AD tasks to demonstrate the effectiveness of the proposed method.

3.
Artículo en Inglés | MEDLINE | ID: mdl-36459610

RESUMEN

Knowledge distillation (KD) is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the local structure preserving (LSP) loss, which matches local structural relationships defined over edges across the student and teacher's node embeddings. This article studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose graph contrastive representation distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across four datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving (GSP) variant of LSP) as well as baselines from 2-D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other.

4.
IEEE Trans Image Process ; 31: 5109-5120, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35895645

RESUMEN

Recent work on curvilinear structure segmentation has mostly focused on backbone network design and loss engineering. The challenge of collecting labelled data, an expensive and labor intensive process, has been overlooked. While labelled data is expensive to obtain, unlabelled data is often readily available. In this work, we propose SemiCurv, a semi-supervised learning (SSL) framework for curvilinear structure segmentation that is able to utilize such unlabelled data to reduce the labelling burden. Our framework addresses two key challenges in formulating curvilinear segmentation in a semi-supervised manner. First, to fully exploit the power of consistency based SSL, we introduce a geometric transformation as strong data augmentation and then align segmentation predictions via a differentiable inverse transformation to enable the computation of pixel-wise consistency. Second, the traditional mean square error (MSE) on unlabelled data is prone to collapsed predictions and this issue exacerbates with severe class imbalance (significantly more background pixels). We propose a N-pair consistency loss to avoid trivial predictions on unlabelled data. We evaluate SemiCurv on six curvilinear segmentation datasets, and find that with no more than 5% of the labelled data, it achieves close to 95% of the performance relative to its fully supervised counterpart.

5.
J Med Internet Res ; 24(7): e34669, 2022 07 29.
Artículo en Inglés | MEDLINE | ID: mdl-35904853

RESUMEN

BACKGROUND: Consumer-grade wearable devices enable detailed recordings of heart rate and step counts in free-living conditions. Recent studies have shown that summary statistics from these wearable recordings have potential uses for longitudinal monitoring of health and disease states. However, the relationship between higher resolution physiological dynamics from wearables and known markers of health and disease remains largely uncharacterized. OBJECTIVE: We aimed to derive high-resolution digital phenotypes from observational wearable recordings and to examine their associations with modifiable and inherent markers of cardiometabolic disease risk. METHODS: We introduced a principled framework to extract interpretable high-resolution phenotypes from wearable data recorded in free-living conditions. The proposed framework standardizes the handling of data irregularities; encodes contextual information regarding the underlying physiological state at any given time; and generates a set of 66 minimally redundant features across active, sedentary, and sleep states. We applied our approach to a multimodal data set, from the SingHEART study (NCT02791152), which comprises heart rate and step count time series from wearables, clinical screening profiles, and whole genome sequences from 692 healthy volunteers. We used machine learning to model nonlinear relationships between the high-resolution phenotypes on the one hand and clinical or genomic risk markers for blood pressure, lipid, weight and sugar abnormalities on the other. For each risk type, we performed model comparisons based on Brier scores to assess the predictive value of high-resolution features over and beyond typical baselines. We also qualitatively characterized the wearable phenotypes for participants who had actualized clinical events. RESULTS: We found that the high-resolution features have higher predictive value than typical baselines for clinical markers of cardiometabolic disease risk: the best models based on high-resolution features had 17.9% and 7.36% improvement in Brier score over baselines based on age and gender and resting heart rate, respectively (P<.001 in each case). Furthermore, heart rate dynamics from different activity states contain distinct information (maximum absolute correlation coefficient of 0.15). Heart rate dynamics in sedentary states are most predictive of lipid abnormalities and obesity, whereas patterns in active states are most predictive of blood pressure abnormalities (P<.001). Moreover, in comparison with standard measures, higher resolution patterns in wearable heart rate recordings are better able to represent subtle physiological dynamics related to genomic risk for cardiometabolic disease (improvement of 11.9%-22.0% in Brier scores; P<.001). Finally, illustrative case studies reveal connections between these high-resolution phenotypes and actualized clinical events, even for borderline profiles lacking apparent cardiometabolic risk markers. CONCLUSIONS: High-resolution digital phenotypes recorded by consumer wearables in free-living states have the potential to enhance the prediction of cardiometabolic disease risk and could enable more proactive and personalized health management.


Asunto(s)
Enfermedades Cardiovasculares , Dispositivos Electrónicos Vestibles , Enfermedades Cardiovasculares/diagnóstico , Estudios Clínicos como Asunto , Estudios de Cohortes , Humanos , Lípidos , Aprendizaje Automático , Fenotipo
6.
IEEE Trans Image Process ; 31: 3494-3508, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35533163

RESUMEN

Background clutters pose challenges to defocus blur detection. Existing approaches often produce artifact predictions in background areas with clutter and relatively low confident predictions in boundary areas. In this work, we tackle the above issues from two perspectives. Firstly, inspired by the recent success of self-attention mechanism, we introduce channel-wise and spatial-wise attention modules to attentively aggregate features at different channels and spatial locations to obtain more discriminative features. Secondly, we propose a generative adversarial training strategy to suppress spurious and low reliable predictions. This is achieved by utilizing a discriminator to identify predicted defocus map from ground-truth ones. As such, the defocus network (generator) needs to produce 'realistic' defocus map to minimize discriminator loss. We further demonstrate that the generative adversarial training allows exploiting additional unlabeled data to improve performance, a.k.a. semi-supervised learning, and we provide the first benchmark on semi-supervised defocus detection. Finally, we demonstrate that the existing evaluation metrics for defocus detection generally fail to quantify the robustness with respect to thresholding. For a fair and practical evaluation, we introduce an effective yet efficient AUFß metric. Extensive experiments on three public datasets verify the superiority of the proposed methods compared against state-of-the-art approaches.

7.
BMC Genomics ; 23(1): 295, 2022 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-35410161

RESUMEN

BACKGROUND: Many transcription factors (TFs), such as multi zinc-finger (ZF) TFs, have multiple DNA binding domains (DBDs), and deciphering the DNA binding motifs of individual DBDs is a major challenge. One example of such a TF is CCCTC-binding factor (CTCF), a TF with eleven ZFs that plays a variety of roles in transcriptional regulation, most notably anchoring DNA loops. Previous studies found that CTCF ZFs 3-7 bind CTCF's core motif and ZFs 9-11 bind a specific upstream motif, but the motifs of ZFs 1-2 have yet to be identified. RESULTS: We developed a new approach to identifying the binding motifs of individual DBDs of a TF through analyzing chromatin immunoprecipitation sequencing (ChIP-seq) experiments in which a single DBD is mutated: we train a deep convolutional neural network to predict whether wild-type TF binding sites are preserved in the mutant TF dataset and interpret the model. We applied this approach to mouse CTCF ChIP-seq data and identified the known binding preferences of CTCF ZFs 3-11 as well as a putative GAG binding motif for ZF 1. We analyzed other CTCF datasets to provide additional evidence that ZF 1 is associated with binding at the motif we identified, and we found that the presence of the motif for ZF 1 is associated with CTCF ChIP-seq peak strength. CONCLUSIONS: Our approach can be applied to any TF for which in vivo binding data from both the wild-type and mutated versions of the TF are available, and our findings provide new potential insights binding preferences of CTCF's DBDs.


Asunto(s)
Factores de Transcripción , Zinc , Animales , Sitios de Unión , Factor de Unión a CCCTC/metabolismo , ADN/metabolismo , Ratones , Redes Neurales de la Computación , Unión Proteica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Zinc/metabolismo , Dedos de Zinc/genética
8.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 404-415, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-32750792

RESUMEN

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad hoc modifications of batch normalization. We propose scalable and practical natural gradient descent (SP-NGD), a principled approach for training models that allows them to attain similar generalization performance to models trained with first-order optimization methods, but with accelerated convergence. Furthermore, SP-NGD scales to large mini-batch sizes with a negligible computational overhead as compared to first-order methods. We evaluated SP-NGD on a benchmark task where highly optimized first-order methods are available as references: training a ResNet-50 model for image classification on ImageNet. We demonstrate convergence to a top-1 validation accuracy of 75.4 percent in 5.5 minutes using a mini-batch size of 32,768 with 1,024 GPUs, as well as an accuracy of 74.9 percent with an extremely large mini-batch size of 131,072 in 873 steps of SP-NGD.


Asunto(s)
Aprendizaje Profundo , Algoritmos , Benchmarking , Redes Neurales de la Computación
9.
IEEE Trans Neural Netw Learn Syst ; 33(6): 2508-2517, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-34464278

RESUMEN

Several techniques for multivariate time series anomaly detection have been proposed recently, but a systematic comparison on a common set of datasets and metrics is lacking. This article presents a systematic and comprehensive evaluation of unsupervised and semisupervised deep-learning-based methods for anomaly detection and diagnosis on multivariate time series data from cyberphysical systems. Unlike previous works, we vary the model and post-processing of model errors, i.e., the scoring functions independently of each other, through a grid of ten models and four scoring functions, comparing these variants to state-of-the-art methods. In time-series anomaly detection, detecting anomalous events is more important than detecting individual anomalous time points. Through experiments, we find that the existing evaluation metrics either do not take events into account or cannot distinguish between a good detector and trivial detectors, such as a random or an all-positive detector. We propose a new metric to overcome these drawbacks, namely, the composite F-score (Fc1), for evaluating time-series anomaly detection. Our study highlights that dynamic scoring functions work much better than static ones for multivariate time series anomaly detection, and the choice of scoring functions often matters more than the choice of the underlying model. We also find that a simple, channel-wise model-the univariate fully connected auto-encoder, with the dynamic Gaussian scoring function emerges as a winning candidate for both anomaly detection and diagnosis, beating state-of-the-art algorithms.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Aprendizaje Automático Supervisado , Factores de Tiempo
10.
IEEE Trans Image Process ; 30: 8702-8712, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34665728

RESUMEN

State-of-the-art methods for semantic segmentation are based on deep neural networks trained on large-scale labeled datasets. Acquiring such datasets would incur large annotation costs, especially for dense pixel-level prediction tasks like semantic segmentation. We consider region-based active learning as a strategy to reduce annotation costs while maintaining high performance. In this setting, batches of informative image regions instead of entire images are selected for labeling. Importantly, we propose that enforcing local spatial diversity is beneficial for active learning in this case, and to incorporate spatial diversity along with the traditional active selection criterion, e.g., data sample uncertainty, in a unified optimization framework for region-based active learning. We apply this framework to the Cityscapes and PASCAL VOC datasets and demonstrate that the inclusion of spatial diversity effectively improves the performance of uncertainty-based and feature diversity-based active learning methods. Our framework achieves 95% performance of fully supervised methods with only 5 - 9% of the labeled pixels, outperforming all state-of-the-art region-based active learning methods for semantic segmentation.

11.
Med Image Anal ; 73: 102148, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34274693

RESUMEN

Deep learning models achieve strong performance for radiology image classification, but their practical application is bottlenecked by the need for large labeled training datasets. Semi-supervised learning (SSL) approaches leverage small labeled datasets alongside larger unlabeled datasets and offer potential for reducing labeling cost. In this work, we introduce NoTeacher, a novel consistency-based SSL framework which incorporates probabilistic graphical models. Unlike Mean Teacher which maintains a teacher network updated via a temporal ensemble, NoTeacher employs two independent networks, thereby eliminating the need for a teacher network. We demonstrate how NoTeacher can be customized to handle a range of challenges in radiology image classification. Specifically, we describe adaptations for scenarios with 2D and 3D inputs, with uni and multi-label classification, and with class distribution mismatch between labeled and unlabeled portions of the training data. In realistic empirical evaluations on three public benchmark datasets spanning the workhorse modalities of radiology (X-Ray, CT, MRI), we show that NoTeacher achieves over 90-95% of the fully supervised AUROC with less than 5-15% labeling budget. Further, NoTeacher outperforms established SSL methods with minimal hyperparameter tuning, and has implications as a principled and practical option for semi-supervised learning in radiology applications.


Asunto(s)
Radiología , Aprendizaje Automático Supervisado , Humanos , Radiografía
12.
Nat Commun ; 11(1): 3603, 2020 07 17.
Artículo en Inglés | MEDLINE | ID: mdl-32681107

RESUMEN

Members of the PR/SET domain-containing (PRDM) family of zinc finger transcriptional regulators play diverse developmental roles. PRDM10 is a yet uncharacterized family member, and its function in vivo is unknown. Here, we report an essential requirement for PRDM10 in pre-implantation embryos and embryonic stem cells (mESCs), where loss of PRDM10 results in severe cell growth inhibition. Detailed genomic and biochemical analyses reveal that PRDM10 functions as a sequence-specific transcription factor. We identify Eif3b, which encodes a core component of the eukaryotic translation initiation factor 3 (eIF3) complex, as a key downstream target, and demonstrate that growth inhibition in PRDM10-deficient mESCs is in part mediated through EIF3B-dependent effects on global translation. Our work elucidates the molecular function of PRDM10 in maintaining global translation, establishes its essential role in early embryonic development and mESC homeostasis, and offers insights into the functional repertoire of PRDMs as well as the transcriptional mechanisms regulating translation.


Asunto(s)
Regulación del Desarrollo de la Expresión Génica , Ratones/metabolismo , Factores de Transcripción/metabolismo , Animales , Desarrollo Embrionario , Células Madre Embrionarias/metabolismo , Factores Eucarióticos de Iniciación/genética , Factores Eucarióticos de Iniciación/metabolismo , Femenino , Péptidos y Proteínas de Señalización Intracelular/genética , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Masculino , Ratones/embriología , Ratones/genética , Biosíntesis de Proteínas , Factores de Transcripción/genética
13.
Artículo en Inglés | MEDLINE | ID: mdl-31395548

RESUMEN

Answering questions using multi-modal context is a challenging problem as it requires a deep integration of diverse data sources. Existing approaches only consider a subset of all possible interactions among data sources during one attention hop. In this paper, we present a Holistic Multi-modal Memory Network (HMMN) framework that fully considers interactions between different input sources (multi-modal context, question) at each hop. In addition, to hone in on relevant information, our framework takes answer choices into consideration during the context retrieval stage. Our HMMN framework effectively integrates information from the multi-modal context, question, and answer choices, enabling more informative context to be retrieved for question answering. Experimental results on the MovieQA and TVQA datasets validate the effectiveness of our HMMN framework. Extensive ablation studies show the importance of holistic reasoning and reveal the contributions of different attention strategies to model performance.

14.
Genes Dev ; 30(13): 1509-14, 2016 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-27401554

RESUMEN

The complexities of DNA recognition by transcription factors (TFs) with multiple Cys2-His2 zinc fingers (C2H2-ZFs) remain poorly studied. We previously reported a mutation (R1092W) in the C2H2-ZF TF Zfp335 that led to selective loss of binding at a subset of targets, although the basis for this effect was unclear. We show that Zfp335 binds DNA and drives transcription via recognition of two distinct consensus motifs by separate ZF clusters and identify the specific motif interaction disrupted by R1092W. Our work presents Zfp335 as a model for understanding how C2H2-ZF TFs may use multiple recognition motifs to control gene expression.


Asunto(s)
Regulación de la Expresión Génica/genética , Factores de Transcripción/metabolismo , Dedos de Zinc/fisiología , Animales , Proteínas de Unión al ADN , Células HEK293 , Humanos , Péptidos y Proteínas de Señalización Intracelular , Células Jurkat , Ratones , Mutación , Proteínas Nucleares , Unión Proteica/genética , Factores de Transcripción/química , Factores de Transcripción/genética , Dedos de Zinc/genética
15.
Elife ; 32014 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-25343476

RESUMEN

The generation of naïve T lymphocytes is critical for immune function yet the mechanisms governing their maturation remain incompletely understood. We have identified a mouse mutant, bloto, that harbors a hypomorphic mutation in the zinc finger protein Zfp335. Zfp335(bloto/bloto) mice exhibit a naïve T cell deficiency due to an intrinsic developmental defect that begins to manifest in the thymus and continues into the periphery, affecting T cells that have recently undergone thymic egress. The effects of Zfp335(bloto) are multigenic and cannot be attributed to altered thymic selection, proliferation or Bcl2-dependent survival. Zfp335 binds to promoter regions via a consensus motif, and its target genes are enriched in categories related to protein metabolism, mitochondrial function, and transcriptional regulation. Restoring the expression of one target, Ankle2, partially rescues T cell maturation. These findings identify Zfp335 as a transcription factor and essential regulator of late-stage intrathymic and post-thymic T cell maturation.


Asunto(s)
Proteínas de la Membrana/genética , Mutación , Proteínas Nucleares/genética , Linfocitos T/metabolismo , Timo/metabolismo , Factores de Transcripción/genética , Dedos de Zinc/genética , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Diferenciación Celular , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Prueba de Complementación Genética , Inmunidad Innata , Proteínas de la Membrana/inmunología , Proteínas de la Membrana/metabolismo , Ratones , Ratones Endogámicos CBA , Ratones Transgénicos , Datos de Secuencia Molecular , Proteínas Nucleares/inmunología , Proteínas Nucleares/metabolismo , Regiones Promotoras Genéticas , Unión Proteica , Proteínas Proto-Oncogénicas c-bcl-2/genética , Proteínas Proto-Oncogénicas c-bcl-2/inmunología , Proteínas Proto-Oncogénicas c-bcl-2/metabolismo , Alineación de Secuencia , Transducción de Señal , Linfocitos T/inmunología , Linfocitos T/patología , Timo/inmunología , Timo/patología , Factores de Transcripción/inmunología , Factores de Transcripción/metabolismo , Transcripción Genética , Dedos de Zinc/inmunología
16.
Bioinformatics ; 24(13): i68-76, 2008 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-18586747

RESUMEN

MOTIVATION: The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for simultaneous alignment and consensus folding of unaligned RNA sequences. Algorithmically, RAF exploits sparsity in the set of likely pairing and alignment candidates for each nucleotide (as identified by the CONTRAfold or CONTRAlign programs) to achieve an effectively quadratic running time for simultaneous pairwise alignment and folding. RAF's fast sparse dynamic programming, in turn, serves as the inference engine within a discriminative machine learning algorithm for parameter estimation. RESULTS: In cross-validated benchmark tests, RAF achieves accuracies equaling or surpassing the current best approaches for RNA multiple sequence secondary structure prediction. However, RAF requires nearly an order of magnitude less time than other simultaneous folding and alignment methods, thus making it especially appropriate for high-throughput studies. AVAILABILITY: Source code for RAF is available at:http://contra.stanford.edu/contrafold/.


Asunto(s)
Algoritmos , Secuencia de Consenso/genética , ARN/genética , ARN/ultraestructura , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Secuencia de Bases , Simulación por Computador , Modelos Químicos , Modelos Moleculares , Datos de Secuencia Molecular , Conformación de Ácido Nucleico
17.
Artículo en Inglés | MEDLINE | ID: mdl-17951821

RESUMEN

Multiprotein complexes play central roles in many cellular pathways. Although many high-throughput experimental techniques have already enabled systematic screening of pairwise protein-protein interactions en masse, the amount of experimentally determined protein complex data has remained relatively lacking. As such, researchers have begun to exploit the vast amount of pairwise interaction data to help discover new protein complexes. However, mining for protein complexes in interaction networks is not an easy task because there are many data artefacts in the underlying protein-protein interaction data due to the limitations in the current high-throughput screening methods. We propose a novel DECAFF (Dense-neighborhood Extraction using Connectivity and conFidence Features) algorithm to mine for dense and reliable subgraphs in protein interaction networks. Our method is devised to address two major limitations in current high throughout protein interaction data, namely, incompleteness and high data noise. Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by DECAFF matched significantly better with actual protein complexes than other existing approaches. Our results demonstrate that pairwise protein interaction networks can be effectively mined to discover new protein complexes, provided that the data artefacts in the underlying interaction data are taken into account adequately.


Asunto(s)
Algoritmos , Modelos Biológicos , Complejos Multiproteicos/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteoma/metabolismo , Transducción de Señal/fisiología , Simulación por Computador
18.
Genome Inform ; 16(2): 260-9, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16901108

RESUMEN

While recent technological advances have made available large datasets of experimentally-detected pairwise protein-protein interactions, there is still a lack of experimentally-determined protein complex data. To make up for this lack of protein complex data, we explore the mining of existing protein interaction graphs for protein complexes. This paper proposes a novel graph mining algorithm to detect the dense neighborhoods (highly connected regions) in an interaction graph which may correspond to protein complexes. Our algorithm first locates local cliques for each graph vertex (protein) and then merge the detected local cliques according to their affinity to form maximal dense regions. We present experimental results with yeast protein interaction data to demonstrate the effectiveness of our proposed method. Compared with other existing techniques, our predicted complexes can match or overlap significantly better with the known protein complexes in the MIPS benchmark database. Novel protein complexes were also predicted to help biologists in their search for new protein complexes.


Asunto(s)
Algoritmos , Complejos Multiproteicos/fisiología , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Complejos Multiproteicos/química , Valor Predictivo de las Pruebas , Mapeo de Interacción de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...