Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 107
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32533167

ABSTRACT

The significance of pan-cancer categories has recently been recognized as widespread in cancer research. Pan-cancer categorizes a cancer based on its molecular pathology rather than an organ. The molecular similarities among multi-omics data found in different cancer types can play several roles in both biological processes and therapeutic developments. Therefore, an integrated analysis for various genomic data is frequently used to reveal novel genetic and molecular mechanisms. However, a variety of algorithms for multi-omics clustering have been proposed in different fields. The comparison of different computational clustering methods in pan-cancer analysis performance remains unclear. To increase the utilization of current integrative methods in pan-cancer analysis, we first provide an overview of five popular computational integrative tools: similarity network fusion, integrative clustering of multiple genomic data types (iCluster), cancer integration via multi-kernel learning (CIMLR), perturbation clustering for data integration and disease subtyping (PINS) and low-rank clustering (LRACluster). Then, a priori interactions in multi-omics data were incorporated to detect prominent molecular patterns in pan-cancer data sets. Finally, we present comparative assessments of these methods, with discussion over key issues in applying these algorithms. We found that all five methods can identify distinct tumor compositions. The pan-cancer samples can be reclassified into several groups by different proportions. Interestingly, each method can classify the tumors into categories that are different from original cancer types or subtypes, especially for ovarian serous cystadenocarcinoma (OV) and breast invasive carcinoma (BRCA) tumors. In addition, all clusters of the five computational methods show notable prognostic values. Furthermore, both the 9 recurrent differential genes and the 15 common pathway characteristics were identified across all the methods. The results and discussion can help the community select appropriate integrative tools according to different research tasks or aims in pan-cancer analysis.


Subject(s)
Breast Neoplasms , Cystadenocarcinoma, Serous , Databases, Genetic , Gene Regulatory Networks , Genomics , Machine Learning , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Computational Biology , Cystadenocarcinoma, Serous/genetics , Cystadenocarcinoma, Serous/metabolism , Female , Humans , Neoplasms , Ovarian Neoplasms/genetics , Ovarian Neoplasms/metabolism
2.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32591780

ABSTRACT

Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were 'many-genes-to-many drugs' interactions, i.e. common modules, opposed to 'one-gene-to-one-drug' interactions. Such modules fully explain the interactions between complex biological regulatory mechanisms and cancer drugs. However, strategies for effectively and robustly identifying the underlying common modules among pharmacogenomics data remain to be improved. In this paper, we aim to provide a detailed evaluation of three categories of state-of-the-art common module identification techniques from a machine learning perspective, including non-negative matrix factorization (NMF), partial least squares (PLS) and network analyses. We first evaluate the performance of six methods, namely SNMNMF, NetNMF, SNPLS, O2PLS, NSBM and HOGMMNC, using two series of simulated data sets with different noise levels and outlier ratios. Then, we conduct experiments using a real world data set of 2091 genes and 101 drugs in 392 cancer cell lines and compare the real experimental results from the aspect of biological process term enrichment, gene-drug and drug-drug interactions. Finally, we present interesting findings from our evaluation study and discuss the advantages and drawbacks of each method. Supplementary information: Supplementary file is available at Briefings in Bioinformatics online.


Subject(s)
Pharmacogenetics , Algorithms , Antineoplastic Agents/pharmacology , Computational Biology/methods , Drug Discovery , Drug Repositioning , Gene Regulatory Networks , Humans , Machine Learning
3.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34021302

ABSTRACT

Genomic data alignment, a fundamental operation in sequencing, can be utilized to map reads into a reference sequence, query on a genomic database and perform genetic tests. However, with the reduction of sequencing cost and the accumulation of genome data, privacy-preserving genomic sequencing data alignment is becoming unprecedentedly important. In this paper, we present a comprehensive review of secure genomic data comparison schemes. We discuss the privacy threats, including adversaries and privacy attacks. The attacks can be categorized into inference, membership, identity tracing and completion attacks and have been applied to obtaining the genomic privacy information. We classify the state-of-the-art genomic privacy-preserving alignment methods into three different scenarios: large-scale reads mapping, encrypted genomic datasets querying and genetic testing to ease privacy threats. A comprehensive analysis of these approaches has been carried out to evaluate the computation and communication complexity as well as the privacy requirements. The survey provides the researchers with the current trends and the insights on the significance and challenges of privacy issues in genomic data alignment.


Subject(s)
Algorithms , Genome, Human , Genomics , Sequence Alignment , Humans
4.
PLoS Pathog ; 17(3): e1009328, 2021 03.
Article in English | MEDLINE | ID: mdl-33657135

ABSTRACT

A key step to the SARS-CoV-2 infection is the attachment of its Spike receptor-binding domain (S RBD) to the host receptor ACE2. Considerable research has been devoted to the development of neutralizing antibodies, including llama-derived single-chain nanobodies, to target the receptor-binding motif (RBM) and to block ACE2-RBD binding. Simple and effective strategies to increase potency are desirable for such studies when antibodies are only modestly effective. Here, we identify and characterize a high-affinity synthetic nanobody (sybody, SR31) as a fusion partner to improve the potency of RBM-antibodies. Crystallographic studies reveal that SR31 binds to RBD at a conserved and 'greasy' site distal to RBM. Although SR31 distorts RBD at the interface, it does not perturb the RBM conformation, hence displaying no neutralizing activities itself. However, fusing SR31 to two modestly neutralizing sybodies dramatically increases their affinity for RBD and neutralization activity against SARS-CoV-2 pseudovirus. Our work presents a tool protein and an efficient strategy to improve nanobody potency.


Subject(s)
Angiotensin-Converting Enzyme 2/immunology , Antibodies, Neutralizing/immunology , Antibodies, Viral/immunology , SARS-CoV-2/immunology , Single-Domain Antibodies/immunology , Antibodies, Neutralizing/chemistry , Antibodies, Neutralizing/genetics , Antibodies, Viral/chemistry , Antibodies, Viral/genetics , Antibody Affinity , Binding Sites , Crystallography, X-Ray , HEK293 Cells , Humans , Models, Molecular , Recombinant Fusion Proteins/chemistry , Recombinant Fusion Proteins/genetics , Recombinant Fusion Proteins/immunology , Single-Domain Antibodies/chemistry , Single-Domain Antibodies/genetics
5.
Hum Brain Mapp ; 43(13): 3970-3986, 2022 09.
Article in English | MEDLINE | ID: mdl-35538672

ABSTRACT

Functional neural activities manifest geometric patterns, as evidenced by the evolving network topology of functional connectivities (FC) even in the resting state. In this work, we propose a novel manifold-based geometric neural network for functional brain networks (called "Geo-Net4Net" for short) to learn the intrinsic low-dimensional feature representations of resting-state brain networks on the Riemannian manifold. This tool allows us to answer the scientific question of how the spontaneous fluctuation of FC supports behavior and cognition. We deploy a set of positive maps and rectified linear unit (ReLU) layers to uncover the intrinsic low-dimensional feature representations of functional brain networks on the Riemannian manifold taking advantage of the symmetric positive-definite (SPD) form of the correlation matrices. Due to the lack of well-defined ground truth in the resting state, existing learning-based methods are limited to unsupervised methodologies. To go beyond this boundary, we propose to self-supervise the feature representation learning of resting-state functional networks by leveraging the task-based counterparts occurring before and after the underlying resting state. With this extra heuristic, our Geo-Net4Net allows us to establish a more reasonable understanding of resting-state FCs by capturing the geometric patterns (aka. spectral/shape signature) associated with resting states on the Riemannian manifold. We have conducted extensive experiments on both simulated data and task-based functional resonance magnetic imaging (fMRI) data from the Human Connectome Project (HCP) database, where our Geo-Net4Net not only achieves more accurate change detection results than other state-of-the-art counterpart methods but also yields ubiquitous geometric patterns that manifest putative insights into brain function.


Subject(s)
Connectome , Deep Learning , Brain/diagnostic imaging , Cognition , Connectome/methods , Humans , Magnetic Resonance Imaging/methods
6.
Chembiochem ; 23(8): e202100534, 2022 04 20.
Article in English | MEDLINE | ID: mdl-34862721

ABSTRACT

Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.


Subject(s)
Peptides , Codon , Mass Spectrometry , Open Reading Frames , Peptides/chemistry
7.
BMC Med Inform Decis Mak ; 22(1): 190, 2022 07 23.
Article in English | MEDLINE | ID: mdl-35870923

ABSTRACT

BACKGROUND: Patient subgroups are important for easily understanding a disease and for providing precise yet personalized treatment through multiple omics dataset integration. Multiomics datasets are produced daily. Thus, the fusion of heterogeneous big data into intrinsic structures is an urgent problem. Novel mathematical methods are needed to process these data in a straightforward way. RESULTS: We developed a novel method for subgrouping patients with distinct survival rates via the integration of multiple omics datasets and by using principal component analysis to reduce the high data dimensionality. Then, we constructed similarity graphs for patients, merged the graphs in a subspace, and analyzed them on a Grassmann manifold. The proposed method could identify patient subgroups that had not been reported previously by selecting the most critical information during the merging at each level of the omics dataset. Our method was tested on empirical multiomics datasets from The Cancer Genome Atlas. CONCLUSION: Through the integration of microRNA, gene expression, and DNA methylation data, our method accurately identified patient subgroups and achieved superior performance compared with popular methods.


Subject(s)
MicroRNAs , Neoplasms , DNA Methylation , Genome , Humans , Neoplasms/genetics , Survival Rate
8.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 39(4): 672-678, 2022 Aug 25.
Article in Zh | MEDLINE | ID: mdl-36008330

ABSTRACT

This study aims to analyze the biomechanical stability of Magic screw in the treatment of acetabular posterior column fractures by finite element analysis. A three-dimensional finite element model of the pelvis was established based on the computed tomography (CT) and magnetic resonance imaging (MRI) data of a volunteer and its effectiveness was verified. Then, the posterior column fracture model of the acetabulum was generated. The biomechanical stability of the four internal fixation models was compared. The 500 N force was applied to the upper surface of the sacrum to simulate human gravity. The maximum implant stresses of retrograde screw fixation, single-plate fixation, double-plate fixation and Magic screw fixation model in standing and sitting position were as follows: 114.10, 113.40 MPa; 58.93, 55.72 MPa; 58.76, 47.47 MPa; and 24.36, 27.50 MPa, respectively. The maximum stresses at the fracture end were as follows: 72.71, 70.51 MPa; 48.18, 22.80 MPa; 52.38, 27.14 MPa; and 34.05, 30.78 MPa, respectively. The fracture end displacement of the retrograde tension screw fixation model was the largest in both states, and the Magic screw had the smallest displacement variation in the standing state, but it was significantly higher than the two plate fixations in the sitting state. Magic screw can satisfy the biomechanical stability of posterior column fracture. Compared with traditional fixations, Magic screw has the advantages of more uniform stress distribution and less stress, and should be recommended.


Subject(s)
Fractures, Bone , Spinal Fractures , Biomechanical Phenomena , Bone Plates , Bone Screws , Finite Element Analysis , Fracture Fixation, Internal/methods , Fractures, Bone/diagnostic imaging , Fractures, Bone/surgery , Humans
9.
BMC Bioinformatics ; 22(1): 326, 2021 Jun 15.
Article in English | MEDLINE | ID: mdl-34130622

ABSTRACT

BACKGROUND: With the development of high-throughput sequencing technology, a huge amount of multi-omics data has been accumulated. Although there are many software tools for statistical analysis and visual development of omics data, these tools are not suitable for private data and non-technical users. Besides, most of these tools have specialized in only one or perhaps a few data typesare, without combining clinical information. What's more, users could not choose data processing and model selection flexibly when using these tools. RESULTS: To help non-technical users to understand and analyze private multi-omics data and ensure data security, we developed an interactive desk tool for statistical analysis and visualization of omics and clinical data (shortly IOAT). Our mainly targets csv format data, and combines clinical data with high-dimensional multi-omics data. It also contains various operations, such as data preprocessing, feature selection, risk assessment, clustering, and survival analysis. By using this tool, users can safely and conveniently try a combination of various methods on their private multi-omics data to find a model suitable for their data, conduct risk assessment and determine their cancer subtypes. At the same time, the tool can also provide them with references to genes that are closely related to tumor staging, facilitating the development of precision oncology. We review IOAT's main features and demonstrate its analysis capabilities on a lung from TCGA. CONCLUSIONS: IOAT is a local desktop tool, which provides a set of multi-omics data integration solutions. It can quickly perform a complete analysis of cancer genome data for subtype discovery and biomarker identification without security issues and writing any code. Thus, our tool can enable cancer biologists and biomedicine researchers to analyze their data more easily and safely. IOAT can be downloaded for free from https://github.com/WlSunshine/IOAT-software .


Subject(s)
Neoplasms , Cluster Analysis , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/genetics , Precision Medicine , Software
10.
Retina ; 41(5): 1110-1117, 2021 May 01.
Article in English | MEDLINE | ID: mdl-33031250

ABSTRACT

PURPOSE: To develop a deep learning (DL) model to detect morphologic patterns of diabetic macular edema (DME) based on optical coherence tomography (OCT) images. METHODS: In the training set, 12,365 OCT images were extracted from a public data set and an ophthalmic center. A total of 656 OCT images were extracted from another ophthalmic center for external validation. The presence or absence of three OCT patterns of DME, including diffused retinal thickening, cystoid macular edema, and serous retinal detachment, was labeled with 1 or 0, respectively. A DL model was trained to detect three OCT patterns of DME. The occlusion test was applied for the visualization of the DL model. RESULTS: Applying 5-fold cross-validation method in internal validation, the area under the receiver operating characteristic curve for the detection of three OCT patterns (i.e., diffused retinal thickening, cystoid macular edema, and serous retinal detachment) was 0.971, 0.974, and 0.994, respectively, with an accuracy of 93.0%, 95.1%, and 98.8%, respectively, a sensitivity of 93.5%, 94.5%, and 96.7%, respectively, and a specificity of 92.3%, 95.6%, and 99.3%, respectively. In external validation, the area under the receiver operating characteristic curve was 0.970, 0.997, and 0.997, respectively, with an accuracy of 90.2%, 95.4%, and 95.9%, respectively, a sensitivity of 80.1%, 93.4%, and 94.9%, respectively, and a specificity of 97.6%, 97.2%, and 96.5%, respectively. The occlusion test showed that the DL model could successfully identify the pathologic regions most critical for detection. CONCLUSION: Our DL model demonstrated high accuracy and transparency in the detection of OCT patterns of DME. These results emphasized the potential of artificial intelligence in assisting clinical decision-making processes in patients with DME.


Subject(s)
Artificial Intelligence , Deep Learning , Diabetic Retinopathy/diagnosis , Macular Edema/diagnosis , Tomography, Optical Coherence/methods , Visual Acuity , Diabetic Retinopathy/complications , Diabetic Retinopathy/physiopathology , Follow-Up Studies , Humans , Macular Edema/etiology , Macular Edema/physiopathology , ROC Curve , Retrospective Studies
11.
Bioinformatics ; 35(4): 602-610, 2019 02 15.
Article in English | MEDLINE | ID: mdl-30052773

ABSTRACT

MOTIVATION: The emergence of large amounts of genomic, chemical, and pharmacological data provides new opportunities and challenges. Identifying gene-drug associations is not only crucial in providing a comprehensive understanding of the molecular mechanisms of drug action, but is also important in the development of effective treatments for patients. However, accurately determining the complex associations among pharmacogenomic data remains challenging. We propose a higher order graph matching with multiple network constraints (HOGMMNC) model to accurately identify gene-drug modules. The HOGMMNC model aims to capture the inherent structural relations within data drawn from multiple sources by hypergraph matching. The proposed technique seamlessly integrates prior constraints to enhance the accuracy and reliability of the identified relations. An effective numerical solution is combined with a novel sampling strategy to solve the problem efficiently. RESULTS: The superiority and effectiveness of our proposed method are demonstrated through a comparison with four state-of-the-art techniques using synthetic and empirical data. The experiments on synthetic data show that the proposed method clearly outperforms other methods, especially in the presence of noise and irrelevant samples. The HOGMMNC model identifies eighteen gene-drug modules in the empirical data. The modules are validated to have significant associations via pathway analysis. Significance: The modules identified by HOGMMNC provide new insights into the molecular mechanisms of drug action and provide patients with more effective treatments. Our proposed method can be applied to the study of other biological correlated module identification problems (e.g. miRNA-gene, gene-methylation, and gene-disease). AVAILABILITY AND IMPLEMENTATION: A matlab package of HOGMMNC is available at https://github.com/scutbioinformatics/HOGMMNC/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Drug Interactions/genetics , Gene Regulatory Networks , Genomics , Humans , Reproducibility of Results
12.
Protein Expr Purif ; 164: 105463, 2019 12.
Article in English | MEDLINE | ID: mdl-31381990

ABSTRACT

Recombinant expression of human membrane proteins in large quantities remains a major challenge. Expression host is an important variable to screen for high-level production of membrane proteins. Using the green fluorescent protein (GFP) as a reporter, we screened the expression of a human multi-pass membrane protein called sterol Δ8-Δ7 isomerase in three different hosts: Escherichia coli, Saccharomyces cerevisiae, and Pichia pastoris. The expression of the His-tagged isomerase was exceptionally high in P. pastoris, reaching ~200 mg L-1 in standard flasks, and ~1,000 mg L-1 in condensed culture that mimics fermentation. The heterogeneously expressed isomerase could be extracted fully with dodecyl maltoside, and the solubilized protein in the form of GFP fusion showed a sharp and symmetric peak on fluorescence-detection size exclusion chromatography. Our work provides a useful source for the purification of the recombinant isomerase.


Subject(s)
Pichia/genetics , Steroid Isomerases/chemistry , Steroid Isomerases/genetics , Chromatography, Gel , Gene Expression , Humans , Recombinant Proteins/chemistry , Recombinant Proteins/genetics , Solubility
13.
Eur Radiol ; 29(10): 5590-5599, 2019 Oct.
Article in English | MEDLINE | ID: mdl-30874880

ABSTRACT

OBJECTIVES: To explore and evaluate the feasibility of radiomics in stratifying nasopharyngeal carcinoma (NPC) into distinct survival subgroups through multi-modalities MRI. METHODS: A total of 658 patients (training cohort: 424; validation cohort: 234) with non-metastatic NPC were enrolled in the retrospective analysis. Each slice was considered as a sample and 4863 radiomics features on the tumor region were extracted from T1-weighted, T2-weighted, and contrast-enhanced T1-weighted MRI. Consensus clustering and manual aggregation were performed on the training cohort to generate a baseline model and classification reference used to train a support vector machine classifier. The risk of each patient was defined as the maximum risk among the slices. Each patient in the validation cohort was assigned to the risk model using the trained classifier. Harrell's concordance index (C-index) was used to measure the prognosis performance, and differences between subgroups were compared using the log-rank test. RESULTS: The training cohort was clustered into four groups with distinct survival patterns. Each patient was assigned to one of the four groups according to the estimated risk. Our method gave a performance (C-index = 0.827, p < .004 and C-index = 0.814, p < .002) better than the T-stage (C-index = 0.815, p = .002 and C-index = 0.803, p = .024), competitive to and more stable than the TNM staging system (C-index = 0.842, p = .003 and C-index = 0.765, p = .050) in the training cohort and the validation cohort. CONCLUSIONS: Through investigating a large one-institutional cohort, the quantitative multi-modalities MRI image phenotypes reveal distinct survival subtypes. KEY POINTS: • Radiomics phenotype of MRI revealed the subtype of nasopharyngeal carcinoma (NPC) patients with distinct survival patterns. • The slice-wise analysis method on MRI helps to stratify patients and provides superior prognostic performance over the TNM staging method. • Risk estimation using the highest risk among slices performed better than using the majority risk in prognosis.


Subject(s)
Nasopharyngeal Carcinoma/diagnostic imaging , Nasopharyngeal Neoplasms/diagnostic imaging , Adult , Cohort Studies , Feasibility Studies , Female , Humans , Image Interpretation, Computer-Assisted/methods , Kaplan-Meier Estimate , Magnetic Resonance Imaging/methods , Male , Middle Aged , Nasopharyngeal Carcinoma/pathology , Nasopharyngeal Neoplasms/pathology , Neoplasm Staging , Prognosis , Retrospective Studies , Support Vector Machine
15.
BMC Bioinformatics ; 17: 384, 2016 Sep 17.
Article in English | MEDLINE | ID: mdl-27639558

ABSTRACT

BACKGROUND: Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. RESULTS: We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. CONCLUSIONS: Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data.


Subject(s)
DNA Copy Number Variations/genetics , High-Throughput Nucleotide Sequencing/methods , Single-Cell Analysis/methods , Software , Binomial Distribution , Cluster Analysis , Computer Simulation , Humans , Sequence Analysis, DNA
16.
BMC Bioinformatics ; 16: 219, 2015 Jul 10.
Article in English | MEDLINE | ID: mdl-26159165

ABSTRACT

BACKGROUND: Classifying cancers by gene selection is among the most important and challenging procedures in biomedicine. A major challenge is to design an effective method that eliminates irrelevant, redundant, or noisy genes from the classification, while retaining all of the highly discriminative genes. RESULTS: We propose a gene selection method, called local hyperplane-based discriminant analysis (LHDA). LHDA adopts two central ideas. First, it uses a local approximation rather than global measurement; second, it embeds a recently reported classification model, K-Local Hyperplane Distance Nearest Neighbor(HKNN) classifier, into its discriminator. Through classification accuracy-based iterations, LHDA obtains the feature weight vector and finally extracts the optimal feature subset. The performance of the proposed method is evaluated in extensive experiments on synthetic and real microarray benchmark datasets. Eight classical feature selection methods, four classification models and two popular embedded learning schemes, including k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), Support Vector Machine (SVM) and Random Forest are employed for comparisons. CONCLUSION: The proposed method yielded comparable to or superior performances to seven state-of-the-art models. The nice performance demonstrate the superiority of combining feature weighting with model learning into an unified framework to achieve the two tasks simultaneously.


Subject(s)
Cluster Analysis , Discriminant Analysis , Machine Learning/standards , Neoplasms/classification , Neoplasms/genetics , Support Vector Machine , Gene Expression Profiling , Gene Regulatory Networks , Humans
17.
BMC Bioinformatics ; 15: 70, 2014 Mar 14.
Article in English | MEDLINE | ID: mdl-24625071

ABSTRACT

BACKGROUND: Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. RESULTS: We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). CONCLUSION: Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Support Vector Machine , Algorithms , Bayes Theorem , Cluster Analysis , Databases, Genetic , Discriminant Analysis , Humans , Neoplasms/genetics , Neoplasms/metabolism , Oligonucleotide Array Sequence Analysis
18.
BMC Cancer ; 14: 366, 2014 May 24.
Article in English | MEDLINE | ID: mdl-24885156

ABSTRACT

BACKGROUND: The apparent diffusion coefficient (ADC) is a highly diagnostic factor in discriminating malignant and benign breast masses in diffusion-weighted magnetic resonance imaging (DW-MRI). The combination of ADC and other pictorial characteristics has improved lesion type identification accuracy. The objective of this study was to reassess the findings on an independent patient group by changing the magnetic field from 1.5-Tesla to 3.0-Tesla. METHODS: This retrospective study consisted of a training group of 234 female patients, including 85 benign and 149 malignant lesions, imaged using 1.5-Tesla MRI, and a test group of 95 female patients, including 19 benign and 85 malignant lesions, imaged using 3.0-Tesla MRI. The lesion of interest was segmented from the raw image and four sets of measurements describing the morphology, kinetics, DW-MRI, and texture of the pictorial properties of each lesion were obtained. Each lesion was characterized by 28 features in total. Three classical machine-learning algorithms were used to build prediction models on the training group, which evaluated the prognostic performance of the multi-sided features in three scenarios. To reduce information redundancy, five highly diagnostic factors were selected to obtain a compact yet informative characterization of the lesion status. RESULTS: Three classification models were built on the training of 1.5-Tesla patients and were tested on the independent 3.0-Tesla test group. The following results were found. i) Characterization of breast masses in a multi-sided way dramatically increased prediction performance. The usage of all features gave a higher performance in both sensitivity and specificity than any individual feature groups or their combinations. ii) ADC was a highly effective factor in improving the sensitivity in discriminating malignant from benign masses. iii) Five features, namely ADC, Sum Average, Entropy, Elongation, and Sum Variance, were selected to achieve the highest performance in diagnosis of the 3.0-Tesla patient group. CONCLUSIONS: The combination of ADC and other multi-sided characteristics can increase the capability of discriminating malignant and benign breast lesions, even under different imaging protocols. The selected compact feature subsets achieved a high diagnostic performance and thus are promising in clinical applications for discriminating lesion type and for personalized treatment planning.


Subject(s)
Breast Neoplasms/pathology , Contrast Media , Diffusion Magnetic Resonance Imaging , Gadolinium DTPA , Adolescent , Adult , Aged , Algorithms , Artificial Intelligence , Diagnosis, Differential , Female , Humans , Image Interpretation, Computer-Assisted , Middle Aged , Predictive Value of Tests , Prognosis , Retrospective Studies , Young Adult
19.
IEEE J Biomed Health Inform ; 28(2): 1134-1143, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37963003

ABSTRACT

Cancer is one of the most challenging health problems worldwide. Accurate cancer survival prediction is vital for clinical decision making. Many deep learning methods have been proposed to understand the association between patients' genomic features and survival time. In most cases, the gene expression matrix is fed directly to the deep learning model. However, this approach completely ignores the interactions between biomolecules, and the resulting models can only learn the expression levels of genes to predict patient survival. In essence, the interaction between biomolecules is the key to determining the direction and function of biological processes. Proteins are the building blocks and principal undertakings of life activities, and as such, their complex interaction network is potentially informative for deep learning methods. Therefore, a more reliable approach is to have the neural network learn both gene expression data and protein interaction networks. We propose a new computational approach, termed CRESCENT, which is a protein-protein interaction (PPI) prior knowledge graph-based convolutional neural network (GCN) to improve cancer survival prediction. CRESCENT relies on the gene expression networks rather than gene expression levels to predict patient survival. The performance of CRESCENT is evaluated on a large-scale pan-cancer dataset consisting of 5991 patients from 16 different types of cancers. Extensive benchmarking experiments demonstrate that our proposed method is competitive in terms of the evaluation metric of the time-dependent concordance index( Ctd) when compared with several existing state-of-the-art approaches. Experiments also show that incorporating the network structure between genomic features effectively improves cancer survival prediction.


Subject(s)
Neoplasms , Protein Interaction Maps , Humans , Protein Interaction Maps/genetics , Algorithms , Neural Networks, Computer , Genomics , Neoplasms/genetics
20.
J Bone Joint Surg Am ; 106(2): 129-137, 2024 Jan 17.
Article in English | MEDLINE | ID: mdl-37992198

ABSTRACT

BACKGROUND: Sacral dysmorphism is not uncommon and complicates S1 iliosacral screw placement partially because of the difficulty of determining the starting point accurately on the sacral lateral view. We propose a method of specifying the starting point. METHODS: The starting point for the S1 iliosacral screw into the dysmorphic sacrum was specifically set at a point where the ossification of the S1/S2 intervertebral disc (OSID) intersected the posterior vertebral cortical line (PVCL) on the sacral lateral view, followed by guidewire manipulation and screw placement on the pelvic outlet and inlet views. Computer-simulated virtual surgical procedures based on pelvic computed tomography (CT) data on 95 dysmorphic sacra were performed to determine whether the starting point was below the iliac cortical density (ICD) and in the S1 oblique osseous corridor and to evaluate the accuracy of screw placement (with 1 screw being used, in the left hemipelvis). Surgical procedures on 17 patients were performed to verify the visibility of the OSID and PVCL, to check the location of the starting point relative to the ICD, and to validate the screw placement safety as demonstrated with postoperative CT scans. RESULTS: In the virtual surgical procedures, the starting point was consistently below the ICD and in the oblique osseous corridor in all patients and all screws were Grade 1. In the clinical surgical procedures, the OSID and PVCL were consistently visible and the starting point was always below the ICD in all patients; overall, 21 S1 iliosacral screws were placed in these 17 patients without malpositioning or iatrogenic injury. CONCLUSIONS: On the lateral view of the dysmorphic sacrum, the OSID and PVCL are visible and intersect at a point that is consistently below the ICD and in the oblique osseous corridor, and thus they can be used to identify the starting point. LEVEL OF EVIDENCE: Therapeutic Level III . See Instructions for Authors for a complete description of levels of evidence.


Subject(s)
Fractures, Bone , Pelvic Bones , Humans , Sacrum/diagnostic imaging , Sacrum/surgery , Pelvic Bones/surgery , Ilium/diagnostic imaging , Ilium/surgery , Fracture Fixation, Internal/methods , Bone Screws , Fractures, Bone/surgery
SELECTION OF CITATIONS
SEARCH DETAIL