RESUMO
Despite the increasing use of copper in C-H functionalizations, the Cu-catalyzed direct deuteration of C-H bonds remains a significant challenge due to its inherent low reactivity in inverse C-H bond reconstruction. In this paper, a novel strategy had been developed to reverse the copper-catalyzed concerted metalation-deprotonation process by inhibiting the unexpected disproportionation of Cu(II) to Cu(III). Picolinic acid was identified as a powerful ligand for facilitating this H/D exchange with D2O as deuterium source, and its inhibition activity was supported by preliminary control experiments and DFT studies.
RESUMO
MXenes, two-dimensional transition (2D) metal carbides/nitrides, have shown promise as cathodic catalysts for accelerating the conversion of lithium polysulfides (LiPSs) in lithium-sulfur (Li-S) batteries due to their diverse redox-active sites and rapid electron transfer. However, efficiently screening the optimal cathodic catalysts out of thousands of MXenes is challenging. To address this, we developed a model that accurately predicts the thermodynamic energy barrier of the rate-limiting step in Li-S batteries. Our model relates the local chemical reactivity of the MXene sites to the p-band center of the terminations and the electronegativity of subsurface transition metals. The accuracy of the model was verified through density functional theory calculations and contrast experiments in pure and Zn-doping MXenes qualitatively. By utilizing this model, we screened a large library of MXenes (27 types of five-atom-layer MXenes) and identified Ti2CS2, Mo2CS2, and W2CS2 as potential cathodic catalysts for Li-S batteries.
RESUMO
Subcallosal cingulate gyrus (SCG) is a target of deep brain stimulation (DBS) for treatment-resistant depression. However, previous randomized controlled trials report that approximately 42% of patients are responders to this therapy of last resort, and suboptimal targeting of SCG is a potential underlying factor to this unsatisfactory efficacy. Tractography has been proposed as a supplementary method to enhance targeting strategy. We performed a connectivity-based segmentation in the SCG region via probabilistic tractography in 100 healthy volunteers from the Human Connectome Project. The SCG voxels with maximum connectivity to brain regions implicated in depression, including Brodmann Area 10 (BA10), cingulate cortex, thalamus, and nucleus accumbens were identified, and the conjunctions were deemed as tractography-based targets. We then performed deterministic tractography using these targets in additional 100 volunteers to calculate streamline counts compassing to relevant brain regions and fibers. We also evaluated the intra- and inter-subject variance using test-retest dataset. Two tractography-based targets were identified. Tractography-based target-1 had the highest streamline counts to right BA10 and bilateral cingulate cortex, while tractography-based target-2 had the highest streamline counts to bilateral nucleus accumbens and uncinate fasciculus. The mean linear distance from individual tractography-based target to anatomy-based target was 3.2 ± 1.8 mm and 2.5 ± 1.4 mm in left and right hemispheres. The mean ± SD of targets between intra- and inter-subjects were 2.2 ± 1.2 and 2.9 ± 1.4 in left hemisphere, and 2.3 ± 1.4 and 3.1 ± 1.7 in right hemisphere, respectively. Individual heterogeneity as well as inherent variability from diffusion imaging should be taken into account during SCG-DBS target planning procedure.
Assuntos
Conectoma , Estimulação Encefálica Profunda , Transtorno Depressivo Resistente a Tratamento , Substância Branca , Humanos , Giro do Cíngulo/diagnóstico por imagem , Giro do Cíngulo/fisiologia , Estimulação Encefálica Profunda/métodos , Depressão , Substância Branca/diagnóstico por imagem , Transtorno Depressivo Resistente a Tratamento/diagnóstico por imagem , Transtorno Depressivo Resistente a Tratamento/terapiaRESUMO
OBJECTIVE: To improve prediction, the AJCC staging system was revised to be consistent with upfront surgery (UFS) and neoadjuvant therapy (NAT) for PDAC. BACKGROUND: The AJCC staging system was designed for patients who have had UFS for PDAC, and it has limited predictive power for patients receiving NAT. METHODS: We examined 146 PDAC patients who had resection after NAT and 1771 who had UFS at Changhai Hospital between 2012 and 2021. The clinicopathological factors were identified using Cox proportional regression analysis, and the Neoadjuvant Therapy Compatible Prognostic (NATCP) staging was developed based on these variables. Validation was carried out in the prospective NAT cohort and the SEER database. The staging approach was compared to the AJCC staging system regarding predictive accuracy. RESULTS: The NAT cohort's multivariate analysis showed that tumor differentiation and the number of positive lymph nodes independently predicted OS. The NATCP staging simplified the AJCC stages, added tumor differentiation, and restaged the disease based on the Kaplan-Meier curve survival differences. The median OS for NATCP stages IA, IB, II, and III was 31.7 months, 25.0 months, and 15.8 months in the NAT cohort and 30.1 months, 22.8 months, 18.3 months, and 14.1 months in the UFS cohort. Compared to the AJCC staging method, the NATCP staging system performed better and was verified in the validation cohort. CONCLUSIONS: Regardless of the use of NAT, NATCP staging demonstrated greater predictive abilities than the existing AJCC staging approach for resected PDAC and may facilitate clinical decision-making based on accurate prediction of patients' OS.
Assuntos
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Terapia Neoadjuvante , Prognóstico , Estudos Prospectivos , Neoplasias Pancreáticas/cirurgia , Carcinoma Ductal Pancreático/cirurgia , Neoplasias PancreáticasRESUMO
We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network." KnowEnG adheres to "FAIR" principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system's potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.
Assuntos
Algoritmos , Computação em Nuvem , Mineração de Dados/métodos , Genômica/métodos , Software , Análise por Conglomerados , Biologia Computacional/métodos , Análise de Dados , Conjuntos de Dados como Assunto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Conhecimento , Aprendizado de Máquina , Metabolômica/métodosRESUMO
BACKGROUND: Manual inspection and instrumentation form the traditional approach to determining tomato color but these methods only determine tomato color at a given moment and cannot predict dynamically how tomato color varies during storage and transportation. Such methods thus cannot help suppliers and retailers establish good management practices for the flexible control of tomato maturity, accurate judgment of market positioning in the industry, or during distribution and marketing. To address this shortcoming, this work first investigates how tomato color parameters (a* and h°) evolve through the various stages of maturity (green, turn, and light red) under different storage conditions. Based on experimental results, it develops an optimized response-surface model (RSM) by using differential evolution to predict how tomato color varies during storage. RESULTS: Tomatoes are more likely to change color at high temperatures and under conditions of high humidity. Temperature affects tomato color more strongly than humidity. The accuracy of the RSM was confirmed by a good agreement with experiments. All determination coefficients R2 of the RSMs for a* and h° are greater than 0.91. The mean absolute errors for a* and h° are 3.8112 and 5.6500, respectively. The root mean square errors for a* and h° are 4.6840 and 6.9198, respectively. CONCLUSION: This research reveals how storage temperature and humidity affect the postharvest variations in tomato color and thus establishes a dynamic model for predicting tomato color. The proposed RSM provides a reliable theoretical foundation for dynamic, nondestructive monitoring of tomato ripeness in the cold chain. © 2021 Society of Chemical Industry.
Assuntos
Solanum lycopersicum , Cor , Frutas , Umidade , Modelos Teóricos , TemperaturaRESUMO
BACKGROUND: Recent studies demonstrate that fish byproducts can be used as sources of bioactive peptides for functional foods. Sturgeon skin contains abundant proteins but it has commonly been discarded during sturgeon processing. The objective of the present work was to identify and characterize the bioactive peptides from protein hydrolysates of sturgeon skin. RESULTS: Sturgeon skin protein extract (SKPE) hydrolyzed by flavourzyme for 60 min exhibited high antioxidant activity, dipeptidyl peptidase IV (DPP-IV) and angiotensin converting enzyme (ACE) inhibitory activity. The sequences of peptides from flavourzyme hydrolysates were identified using high-performance liquid chromatography-tandem mass spectrometry. Gly-Asp-Arg-Gly-Glu-Ser-Gly-Pro-Ala (P1) showed the highest DPPH radical scavenging activity (DPPH IC50 = 1.93 mmol L-1 ). Gly-Pro-Ala-Gly-Glu-Arg-Gly-Glu-Gly-Gly-Pro-Arg (P11) (DPP-IV IC50 = 2.14 mmol L-1 ) and Ser-Pro-Gly-Pro-Asp-Gly-Lys-Thr-Gly-Pro-Arg (P12) (DPP-IV IC50 = 2.61 mmol L-1 ) exhibited the strongest DPP-IV inhibitory activity. Gly-Pro-Pro-Gly-Ala-Asp-Gly-Gln-Ala-Gly-Ala-Lys (P6) displayed the highest ACE inhibitory activity (ACE IC50 = 3.77 mmol L-1 ). The molecular docking analysis revealed that DPP-IV inhibition of P11 and P12 are mainly attributed to hydrogen bonds and hydrophobic interactions, whereas ACE inhibition of P6 is mainly attributed to strong hydrogen bonds. CONCLUSIONS: These results indicate that SKPE hydrolysates generated by flavourzyme are potential sources of bioactive peptides that could be used in the health food industry. © 2021 Society of Chemical Industry.
Assuntos
Produtos Pesqueiros , Peptídeos , Hidrolisados de Proteína , Pele , Animais , Cromatografia Líquida de Alta Pressão , Produtos Pesqueiros/análise , Simulação de Acoplamento Molecular , Peptídeos/análise , Peptídeos/química , Hidrolisados de Proteína/análise , Hidrolisados de Proteína/química , Proteínas , Pele/químicaRESUMO
Since real-world objects and their interactions are often multi-modal and multi-typed, heterogeneous networks have been widely used as a more powerful, realistic, and generic superclass of traditional homogeneous networks (graphs). Meanwhile, representation learning (a.k.a. embedding) has recently been intensively studied and shown effective for various network mining and analytical tasks. In this work, we aim to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), which includes but goes beyond a normal survey. Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms. Moreover, existing HNE algorithms, though mostly claimed generic, are often evaluated on different datasets. Understandable due to the application favor of HNE, such indirect comparisons largely hinder the proper attribution of improved task performance towards effective data preprocessing and novel technical design, especially considering the various ways possible to construct a heterogeneous network from real-world application data. Therefore, as the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and etc. from different sources, towards handy and fair evaluations of HNE algorithms. As the third contribution, we carefully refactor and amend the implementations and create friendly interfaces for 13 popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings. By putting all existing HNE algorithms under a unified framework, we aim to provide a universal reference and guideline for the understanding and development of HNE algorithms. Meanwhile, by open-sourcing all data and code, we envision to serve the community with an ready-to-use benchmark platform to test and compare the performance of existing and future HNE algorithms (https://github.com/yangji9181/HNE).
RESUMO
This study aims to explore the molecular mechanism of Ganoderma against gastric cancer based on network pharmacology, molecular docking, and cell experiment. The active components and targets of Ganoderma were retrieved from Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform(TCMSP), and gastric cancer-related targets from GeneCards and Online Mendelian Inheritance in Man(OMIM). The protein-protein interaction(PPI) network of the common targets was constructed with STRING, followed by Gene Ontology(GO) term enrichment and Kyoto Encyclopedia of Genes and Genomes(KEGG) pathway enrichment analysis of the common genes based on Bioconductor and R language. The medicinal-disease-component-target network and medicinal-disease-component-target-pathway network were established by Cytoscape. Molecular docking was performed between ß-sitosterol(the key component in Ganoderma) and the top 15 targets in the PPI network. Cell experiment was performed to verify the findings. A total of 14 active components and 28 targets of Ganoderma were retrieved, and the medicinal and the disease shared 25 targets, including caspase-3(CASP3), caspase-8(CASP8), caspase-9(CASP9), and B-cell lymphoma-2(BCL2). The common targets involved 72 signaling pathways and apoptosis and p53 signaling pathway may play a crucial role in the effect of Ganoderma against gastric cancer. ß-sitosterol had strong binding activity to the top 15 targets in the PPI network. The in vitro cell experiment demonstrated that ß-sitosterol inhibited gastric cancer AGS cell proliferation by inducing cell apoptosis and cell cycle arrest in the S phase, which might be related to the regulation of the p53 pathway. This study shows the multi-component, multi-target, and multi-pathway characteristics of Ganoderma against gastric cancer, which lays a scientific basis for further research on the molecular mechanism.
Assuntos
Ganoderma , Medicina Tradicional Chinesa , Neoplasias Gástricas , Humanos , Simulação de Acoplamento Molecular , Farmacologia em Rede , Neoplasias Gástricas/tratamento farmacológico , Neoplasias Gástricas/genéticaRESUMO
At present, saccharides as hydrophilic matrixes, have been gradually used in amorphous solid dispersions (ASD) for dispersing poorly water-soluble drugs without surfactants. In this study, an amorphous chitosan oligosaccharide (COS) was applied as a water-soluble matrix to form surfactant-free ASD via the ball milling to vitrify quercetin (QUE) and enhance the dissolution and bioavailability. Solid-state characterization (DSC, XRPD, FTIR, SEM and PLM) and physical stability assessments verified that the prepared ASDs showed excellent physical stability with complete amorphization due to potential interactions between QUE and COS. In vitro sink dissolution tests suggested all QUE-COS ASDs (w:w, 1:1, 1:2 and 1:4) significantly enhanced the dissolution rate of QUE. Meanwhile, in vitro non-sink dissolution exhibited that the maximum supersaturated concentration ranged from 112.62 to 138.00 µg/mL for all QUE-COS ASDs, which was much higher than that of pure QUE. Besides, the supersaturation of QUE-COS ASD kept for at least 24 h. In rat pharmacokinetics, the oral bioavailability of QUE-COS ASDs showed 1.64 â¼ 2.25 times increase compared to the pure QUE (p < .01). Hence, the present study confirms the amorphous COS could be applied as a promising hydrophilic matrix in QUE-COS ASDs for enhancing dissolution performance and bioavailability of QUE.
Assuntos
Quercetina , Tensoativos , Animais , Disponibilidade Biológica , Interações Hidrofóbicas e Hidrofílicas , Ratos , SolubilidadeRESUMO
MOTIVATION: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases. Although recent studies explored using neural network models for BioNER to free experts from manual feature engineering, the performance remains limited by the available training data for each entity type. RESULTS: We propose a multi-task learning framework for BioNER to collectively use the training data of different types of entities and improve the performance on each of them. In experiments on 15 benchmark BioNER datasets, our multi-task model achieves substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Further analysis shows that the large performance gains come from sharing character- and word-level information among relevant biomedical entities across differently labeled corpora. AVAILABILITY AND IMPLEMENTATION: Our source code is available at https://github.com/yuzhimanhua/lm-lstm-crf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Benchmarking , SoftwareRESUMO
Background: Stroke is a major cause of life-long disability in adults, associated with poor quality of life. Virtual reality (VR)-based therapy systems are known to be helpful in improving motor functions following stroke, but recent clinical findings have not been included in the previous publications of meta-analysis studies.Aims: This meta-analysis was based on the available literature to evaluate the therapeutic potential of VR as compared to dose-matched conventional therapies (CT) in patients with stroke.Methods: We retrieved relevant articles in EMBASE, MEDLINE, PubMed, and Web of Science published between 2010 and February 2019. Peer-reviewed randomized controlled trials that compared VR with CT were included.Results: A total of 27 studies met the inclusion criteria. The analysis indicated that the VR group showed statistically significant improvement in the recovery of UL function (Fugl-Meyer Upper Extremity [FM-UE]: n = 20 studies, Mean Difference [MD] = 3.84, P = .01), activity (Box and Block Test [BBT]: n = 13, MD = 3.82, P = .04), and participation (Motor Activity Log [MAL]: n = 6, MD = 0.8, P = .0001) versus the control group.Conclusion: VR appears to be a promising therapeutic technology for UL motor rehabilitation in patients with stroke.
Assuntos
Reabilitação do Acidente Vascular Cerebral , Acidente Vascular Cerebral , Terapia de Exposição à Realidade Virtual , Adulto , Humanos , Qualidade de Vida , Ensaios Clínicos Controlados Aleatórios como Assunto , Recuperação de Função Fisiológica , Acidente Vascular Cerebral/terapia , Extremidade SuperiorRESUMO
This paper investigates the design trade-offs of radially single-mode and azimuthally multimode (RSMAM) ring-core fibers (RCFs), thereby revealing a more efficient use of the weakly coupled linearly polarized (LP) modes in mode-division multiplexing (MDM) systems. The influences of the increasing number of LP modes on the main propagation properties (i.e., effective index difference, effective area Aeff, and macro-bending sensitivity at a wavelength of 1550 nm) and the key limiting factors for such an increase are numerically described. Based on 1) the design criteria of weakly coupled few-mode fibers described in [P. Sillard, J. Lightw. Technol.32, 2824 (2014)] and 2) an assumption that the refractive index contrast is ≤1% (for facilitating the fiber manufacturing), we point out that the step-index 4-LP-mode RSMAM RCF appears feasible, while RSMAM RCFs with a higher number of LP modes are still primarily limited by the oversized Aeff and undesired macro-bending sensitivities. Finally, in order to provide a better compatibility with weakly coupled MDM systems, we present improved designs for the 3- and 4-LP-mode RSMAM RCFs given in [M. Kasahara, J. Lightw. Technol.32, 1337 (2014)Y. Jung, J. Lightw. Technol.35, 1363 (2017)] by taking into account the spatial information densities and macro-bending sensitivities.
RESUMO
Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. Using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely, ischemic heart disease, cardiomyopathies, cerebrovascular accident, congenital heart disease, arrhythmias, and valve disease, anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the 6 groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-Aware Semantic Online Analytical Processing was applied to semantically rank the association of proteins to each CVD and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein was visualized as a six-dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins, whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all 6 CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely, insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters predominantly associated with a targeted CVD; analyses of these proteins revealed unexpected insights underlying the key ECM-related molecular pathogenesis of each CVD, including virus assembly and release in arrhythmias. NEW & NOTEWORTHY The present study is the first application of a text-mining algorithm to characterize the relationships of 709 extracellular matrix-related proteins with 6 categories of cardiovascular disease described in 1,099,254 abstracts. Our analysis informed unexpected extracellular matrix functions, pathways, and molecular relationships implicated in the six cardiovascular diseases.
Assuntos
Doenças Cardiovasculares/metabolismo , Mineração de Dados/métodos , Proteínas da Matriz Extracelular/metabolismo , Matriz Extracelular/metabolismo , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Big Data , Biomarcadores/metabolismo , Bases de Dados Factuais , Humanos , Análise de Componente Principal , Mapas de Interação de ProteínasRESUMO
In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical and high dimensional features into a comprehensive structure with rich interpretable information in the data. In this paper, we propose a novel Discriminative Pattern-based Prediction framework (DPPred) to accomplish the prediction tasks by taking their advantages of both effectiveness and interpretability. Specifically, DPPred adopts the concise discriminative patterns that are on the prefix paths from the root to leaf nodes in the tree-based models. DPPred selects a limited number of the useful discriminative patterns by searching for the most effective pattern combination to fit generalized linear models. Extensive experiments show that in many scenarios, DPPred provides competitive accuracy with the state-of-the-art as well as the valuable interpretability for developers and experts. In particular, taking a clinical application dataset as a case study, our DPPred outperforms the baselines by using only 40 concise discriminative patterns out of a potentially exponentially large set of patterns.
RESUMO
As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus and has various downstream applications including information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and genres without extra but expensive adaption. None of the state-of-the-art models, even data-driven models, is fully automated because they require human experts for designing rules or labeling phrases. In this paper, we propose a novel framework for automated phrase mining, AutoPhrase, which supports any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger. Compared to the state-of-the-art methods, AutoPhrase has shown significant improvements in both effectiveness and efficiency on five real-world datasets across different domains and languages. Besides, AutoPhrase can be extend to model single-word quality phrases.
RESUMO
Innovations in food packaging systems will help meet the evolving needs of the market, such as consumer preference for "healthy" and high-quality food products and reduction of the negative environmental impacts of food packaging. Emerging concepts of active and intelligent packaging technologies provide numerous innovative solutions for prolonging shelf-life and improving the quality and safety of food products. There are also new approaches to improving the passive characteristics of food packaging, such as mechanical strength, barrier performance, and thermal stability. The development of sustainable or green packaging has the potential to reduce the environmental impacts of food packaging through the use of edible or biodegradable materials, plant extracts, and nanomaterials. Active, intelligent, and green packaging technologies can work synergistically to yield a multipurpose food-packaging system with no negative interactions between components, and this aim can be seen as the ultimate future goal for food packaging technology. This article reviews the principles of food packaging and recent developments in different types of food packaging technologies. Global patents and future research trends are also discussed.
RESUMO
In real-world applications, objects of multiple types are interconnected, forming Heterogeneous Information Networks. In such heterogeneous information networks, we make the key observation that many interactions happen due to some event and the objects in each event form a complete semantic unit. By taking advantage of such a property, we propose a generic framework called HyperEdge-BasedEmbedding (Hebe) to learn object embeddings with events in heterogeneous information networks, where a hyperedge encompasses the objects participating in one event. The Hebe framework models the proximity among objects in each event with two methods: (1) predicting a target object given other participating objects in the event, and (2) predicting if the event can be observed given all the participating objects. Since each hyperedge encapsulates more information of a given event, Hebe is robust to data sparseness and noise. In addition, Hebe is scalable when the data size spirals. Extensive experiments on large-scale real-world datasets show the efficacy and robustness of the proposed framework.
RESUMO
BACKGROUND: Cancer subtype information is critically important for understanding tumor heterogeneity. Existing methods to identify cancer subtypes have primarily focused on utilizing generic clustering algorithms (such as hierarchical clustering) to identify subtypes based on gene expression data. The network-level interaction among genes, which is key to understanding the molecular perturbations in cancer, has been rarely considered during the clustering process. The motivation of our work is to develop a method that effectively incorporates molecular interaction networks into the clustering process to improve cancer subtype identification. RESULTS: We have developed a new clustering algorithm for cancer subtype identification, called "network-assisted co-clustering for the identification of cancer subtypes" (NCIS). NCIS combines gene network information to simultaneously group samples and genes into biologically meaningful clusters. Prior to clustering, we assign weights to genes based on their impact in the network. Then a new weighted co-clustering algorithm based on a semi-nonnegative matrix tri-factorization is applied. We evaluated the effectiveness of NCIS on simulated datasets as well as large-scale Breast Cancer and Glioblastoma Multiforme patient samples from The Cancer Genome Atlas (TCGA) project. NCIS was shown to better separate the patient samples into clinically distinct subtypes and achieve higher accuracy on the simulated datasets to tolerate noise, as compared to consensus hierarchical clustering. CONCLUSIONS: The weighted co-clustering approach in NCIS provides a unique solution to incorporate gene network information into the clustering process. Our tool will be useful to comprehensively identify cancer subtypes that would otherwise be obscured by cancer heterogeneity, using high-throughput and high-dimensional gene expression data.