Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 117
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 40(9)2024 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-39276157

RESUMEN

MOTIVATION: Neoantigens, derived from somatic mutations in cancer cells, can elicit anti-tumor immune responses when presented to autologous T cells by human leukocyte antigen. Identifying immunogenic neoantigens is crucial for cancer immunotherapy development. However, the accuracy of current bioinformatic methods remains unsatisfactory. Surface and structural features of peptide-HLA class I (pHLA-I) complexes offer valuable insight into the immunogenicity of neoantigens. RESULTS: We present NeoaPred, a deep-learning framework for neoantigen prediction. NeoaPred accurately constructs pHLA-I complex structures, with 82.37% of the predicted structures showing an RMSD of < 1 Å. Using these structures, NeoaPred integrates differences in surface, structural, and atom group features between the mutant peptide and its wild-type counterpart to predict a foreignness score. This foreignness score is an effective factor for neoantigen prediction, achieving an AUROC (Area Under the Receiver Operating Characteristic Curve) of 0.81 and an AUPRC (Area Under the Precision-Recall Curve) of 0.54 in the test set, outperforming existing methods. AVAILABILITY AND IMPLEMENTATION: The source code is released under an Apache v2.0 license and is available at the GitHub repository (https://github.com/Dulab2020/NeoaPred).


Asunto(s)
Antígenos de Neoplasias , Aprendizaje Profundo , Péptidos , Humanos , Antígenos de Neoplasias/inmunología , Antígenos de Neoplasias/química , Péptidos/química , Péptidos/inmunología , Antígenos HLA/inmunología , Antígenos HLA/química , Biología Computacional/métodos , Antígenos de Histocompatibilidad Clase I/química , Antígenos de Histocompatibilidad Clase I/inmunología , Programas Informáticos , Neoplasias/inmunología
2.
Methods ; 231: 215-225, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39396747

RESUMEN

The high dimensionality and noise challenges in genomic data make it difficult for traditional clustering methods. Existing multi-kernel clustering methods aim to improve the quality of the affinity matrix by learning a set of base kernels, thereby enhancing clustering performance. However, directly learning from the original base kernels presents challenges in handling errors and redundancies when dealing with high-dimensional data, and there is still a lack of feasible multi-kernel fusion strategies. To address these issues, we propose a Multi-Kernel Clustering method with Tensor fusion on Grassmann manifolds, called MKCTM. Specifically, we maximize the clustering consensus among base kernels by imposing tensor low-rank constraints to eliminate noise and redundancy. Unlike traditional kernel fusion approaches, our method fuses learned base kernels on the Grassmann manifold, resulting in a final consensus matrix for clustering. We integrate tensor learning and fusion processes into a unified optimization model and propose an effective iterative optimization algorithm for solving it. Experimental results on ten datasets, comparing against 12 popular baseline clustering methods, confirm the superiority of our approach. Our code is available at https://github.com/foureverfei/MKCTM.git.


Asunto(s)
Algoritmos , Genómica , Genómica/métodos , Análisis por Conglomerados , Humanos , Programas Informáticos
3.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32533167

RESUMEN

The significance of pan-cancer categories has recently been recognized as widespread in cancer research. Pan-cancer categorizes a cancer based on its molecular pathology rather than an organ. The molecular similarities among multi-omics data found in different cancer types can play several roles in both biological processes and therapeutic developments. Therefore, an integrated analysis for various genomic data is frequently used to reveal novel genetic and molecular mechanisms. However, a variety of algorithms for multi-omics clustering have been proposed in different fields. The comparison of different computational clustering methods in pan-cancer analysis performance remains unclear. To increase the utilization of current integrative methods in pan-cancer analysis, we first provide an overview of five popular computational integrative tools: similarity network fusion, integrative clustering of multiple genomic data types (iCluster), cancer integration via multi-kernel learning (CIMLR), perturbation clustering for data integration and disease subtyping (PINS) and low-rank clustering (LRACluster). Then, a priori interactions in multi-omics data were incorporated to detect prominent molecular patterns in pan-cancer data sets. Finally, we present comparative assessments of these methods, with discussion over key issues in applying these algorithms. We found that all five methods can identify distinct tumor compositions. The pan-cancer samples can be reclassified into several groups by different proportions. Interestingly, each method can classify the tumors into categories that are different from original cancer types or subtypes, especially for ovarian serous cystadenocarcinoma (OV) and breast invasive carcinoma (BRCA) tumors. In addition, all clusters of the five computational methods show notable prognostic values. Furthermore, both the 9 recurrent differential genes and the 15 common pathway characteristics were identified across all the methods. The results and discussion can help the community select appropriate integrative tools according to different research tasks or aims in pan-cancer analysis.


Asunto(s)
Neoplasias de la Mama , Cistadenocarcinoma Seroso , Bases de Datos Genéticas , Redes Reguladoras de Genes , Genómica , Aprendizaje Automático , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Biología Computacional , Cistadenocarcinoma Seroso/genética , Cistadenocarcinoma Seroso/metabolismo , Femenino , Humanos , Neoplasias , Neoplasias Ováricas/genética , Neoplasias Ováricas/metabolismo
4.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32591780

RESUMEN

Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were 'many-genes-to-many drugs' interactions, i.e. common modules, opposed to 'one-gene-to-one-drug' interactions. Such modules fully explain the interactions between complex biological regulatory mechanisms and cancer drugs. However, strategies for effectively and robustly identifying the underlying common modules among pharmacogenomics data remain to be improved. In this paper, we aim to provide a detailed evaluation of three categories of state-of-the-art common module identification techniques from a machine learning perspective, including non-negative matrix factorization (NMF), partial least squares (PLS) and network analyses. We first evaluate the performance of six methods, namely SNMNMF, NetNMF, SNPLS, O2PLS, NSBM and HOGMMNC, using two series of simulated data sets with different noise levels and outlier ratios. Then, we conduct experiments using a real world data set of 2091 genes and 101 drugs in 392 cancer cell lines and compare the real experimental results from the aspect of biological process term enrichment, gene-drug and drug-drug interactions. Finally, we present interesting findings from our evaluation study and discuss the advantages and drawbacks of each method. Supplementary information: Supplementary file is available at Briefings in Bioinformatics online.


Asunto(s)
Farmacogenética , Algoritmos , Antineoplásicos/farmacología , Biología Computacional/métodos , Descubrimiento de Drogas , Reposicionamiento de Medicamentos , Redes Reguladoras de Genes , Humanos , Aprendizaje Automático
5.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34021302

RESUMEN

Genomic data alignment, a fundamental operation in sequencing, can be utilized to map reads into a reference sequence, query on a genomic database and perform genetic tests. However, with the reduction of sequencing cost and the accumulation of genome data, privacy-preserving genomic sequencing data alignment is becoming unprecedentedly important. In this paper, we present a comprehensive review of secure genomic data comparison schemes. We discuss the privacy threats, including adversaries and privacy attacks. The attacks can be categorized into inference, membership, identity tracing and completion attacks and have been applied to obtaining the genomic privacy information. We classify the state-of-the-art genomic privacy-preserving alignment methods into three different scenarios: large-scale reads mapping, encrypted genomic datasets querying and genetic testing to ease privacy threats. A comprehensive analysis of these approaches has been carried out to evaluate the computation and communication complexity as well as the privacy requirements. The survey provides the researchers with the current trends and the insights on the significance and challenges of privacy issues in genomic data alignment.


Asunto(s)
Algoritmos , Genoma Humano , Genómica , Alineación de Secuencia , Humanos
6.
PLoS Pathog ; 17(3): e1009328, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33657135

RESUMEN

A key step to the SARS-CoV-2 infection is the attachment of its Spike receptor-binding domain (S RBD) to the host receptor ACE2. Considerable research has been devoted to the development of neutralizing antibodies, including llama-derived single-chain nanobodies, to target the receptor-binding motif (RBM) and to block ACE2-RBD binding. Simple and effective strategies to increase potency are desirable for such studies when antibodies are only modestly effective. Here, we identify and characterize a high-affinity synthetic nanobody (sybody, SR31) as a fusion partner to improve the potency of RBM-antibodies. Crystallographic studies reveal that SR31 binds to RBD at a conserved and 'greasy' site distal to RBM. Although SR31 distorts RBD at the interface, it does not perturb the RBM conformation, hence displaying no neutralizing activities itself. However, fusing SR31 to two modestly neutralizing sybodies dramatically increases their affinity for RBD and neutralization activity against SARS-CoV-2 pseudovirus. Our work presents a tool protein and an efficient strategy to improve nanobody potency.


Asunto(s)
Enzima Convertidora de Angiotensina 2/inmunología , Anticuerpos Neutralizantes/inmunología , Anticuerpos Antivirales/inmunología , SARS-CoV-2/inmunología , Anticuerpos de Dominio Único/inmunología , Anticuerpos Neutralizantes/química , Anticuerpos Neutralizantes/genética , Anticuerpos Antivirales/química , Anticuerpos Antivirales/genética , Afinidad de Anticuerpos , Sitios de Unión , Cristalografía por Rayos X , Células HEK293 , Humanos , Modelos Moleculares , Proteínas Recombinantes de Fusión/química , Proteínas Recombinantes de Fusión/genética , Proteínas Recombinantes de Fusión/inmunología , Anticuerpos de Dominio Único/química , Anticuerpos de Dominio Único/genética
7.
Hum Brain Mapp ; 43(13): 3970-3986, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35538672

RESUMEN

Functional neural activities manifest geometric patterns, as evidenced by the evolving network topology of functional connectivities (FC) even in the resting state. In this work, we propose a novel manifold-based geometric neural network for functional brain networks (called "Geo-Net4Net" for short) to learn the intrinsic low-dimensional feature representations of resting-state brain networks on the Riemannian manifold. This tool allows us to answer the scientific question of how the spontaneous fluctuation of FC supports behavior and cognition. We deploy a set of positive maps and rectified linear unit (ReLU) layers to uncover the intrinsic low-dimensional feature representations of functional brain networks on the Riemannian manifold taking advantage of the symmetric positive-definite (SPD) form of the correlation matrices. Due to the lack of well-defined ground truth in the resting state, existing learning-based methods are limited to unsupervised methodologies. To go beyond this boundary, we propose to self-supervise the feature representation learning of resting-state functional networks by leveraging the task-based counterparts occurring before and after the underlying resting state. With this extra heuristic, our Geo-Net4Net allows us to establish a more reasonable understanding of resting-state FCs by capturing the geometric patterns (aka. spectral/shape signature) associated with resting states on the Riemannian manifold. We have conducted extensive experiments on both simulated data and task-based functional resonance magnetic imaging (fMRI) data from the Human Connectome Project (HCP) database, where our Geo-Net4Net not only achieves more accurate change detection results than other state-of-the-art counterpart methods but also yields ubiquitous geometric patterns that manifest putative insights into brain function.


Asunto(s)
Conectoma , Aprendizaje Profundo , Encéfalo/diagnóstico por imagen , Cognición , Conectoma/métodos , Humanos , Imagen por Resonancia Magnética/métodos
8.
Chembiochem ; 23(8): e202100534, 2022 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-34862721

RESUMEN

Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.


Asunto(s)
Péptidos , Codón , Espectrometría de Masas , Sistemas de Lectura Abierta , Péptidos/química
9.
BMC Med Inform Decis Mak ; 22(1): 190, 2022 07 23.
Artículo en Inglés | MEDLINE | ID: mdl-35870923

RESUMEN

BACKGROUND: Patient subgroups are important for easily understanding a disease and for providing precise yet personalized treatment through multiple omics dataset integration. Multiomics datasets are produced daily. Thus, the fusion of heterogeneous big data into intrinsic structures is an urgent problem. Novel mathematical methods are needed to process these data in a straightforward way. RESULTS: We developed a novel method for subgrouping patients with distinct survival rates via the integration of multiple omics datasets and by using principal component analysis to reduce the high data dimensionality. Then, we constructed similarity graphs for patients, merged the graphs in a subspace, and analyzed them on a Grassmann manifold. The proposed method could identify patient subgroups that had not been reported previously by selecting the most critical information during the merging at each level of the omics dataset. Our method was tested on empirical multiomics datasets from The Cancer Genome Atlas. CONCLUSION: Through the integration of microRNA, gene expression, and DNA methylation data, our method accurately identified patient subgroups and achieved superior performance compared with popular methods.


Asunto(s)
MicroARNs , Neoplasias , Metilación de ADN , Genoma , Humanos , Neoplasias/genética , Tasa de Supervivencia
10.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 39(4): 672-678, 2022 Aug 25.
Artículo en Zh | MEDLINE | ID: mdl-36008330

RESUMEN

This study aims to analyze the biomechanical stability of Magic screw in the treatment of acetabular posterior column fractures by finite element analysis. A three-dimensional finite element model of the pelvis was established based on the computed tomography (CT) and magnetic resonance imaging (MRI) data of a volunteer and its effectiveness was verified. Then, the posterior column fracture model of the acetabulum was generated. The biomechanical stability of the four internal fixation models was compared. The 500 N force was applied to the upper surface of the sacrum to simulate human gravity. The maximum implant stresses of retrograde screw fixation, single-plate fixation, double-plate fixation and Magic screw fixation model in standing and sitting position were as follows: 114.10, 113.40 MPa; 58.93, 55.72 MPa; 58.76, 47.47 MPa; and 24.36, 27.50 MPa, respectively. The maximum stresses at the fracture end were as follows: 72.71, 70.51 MPa; 48.18, 22.80 MPa; 52.38, 27.14 MPa; and 34.05, 30.78 MPa, respectively. The fracture end displacement of the retrograde tension screw fixation model was the largest in both states, and the Magic screw had the smallest displacement variation in the standing state, but it was significantly higher than the two plate fixations in the sitting state. Magic screw can satisfy the biomechanical stability of posterior column fracture. Compared with traditional fixations, Magic screw has the advantages of more uniform stress distribution and less stress, and should be recommended.


Asunto(s)
Fracturas Óseas , Fracturas de la Columna Vertebral , Fenómenos Biomecánicos , Placas Óseas , Tornillos Óseos , Análisis de Elementos Finitos , Fijación Interna de Fracturas/métodos , Fracturas Óseas/diagnóstico por imagen , Fracturas Óseas/cirugía , Humanos
11.
BMC Bioinformatics ; 22(1): 326, 2021 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-34130622

RESUMEN

BACKGROUND: With the development of high-throughput sequencing technology, a huge amount of multi-omics data has been accumulated. Although there are many software tools for statistical analysis and visual development of omics data, these tools are not suitable for private data and non-technical users. Besides, most of these tools have specialized in only one or perhaps a few data typesare, without combining clinical information. What's more, users could not choose data processing and model selection flexibly when using these tools. RESULTS: To help non-technical users to understand and analyze private multi-omics data and ensure data security, we developed an interactive desk tool for statistical analysis and visualization of omics and clinical data (shortly IOAT). Our mainly targets csv format data, and combines clinical data with high-dimensional multi-omics data. It also contains various operations, such as data preprocessing, feature selection, risk assessment, clustering, and survival analysis. By using this tool, users can safely and conveniently try a combination of various methods on their private multi-omics data to find a model suitable for their data, conduct risk assessment and determine their cancer subtypes. At the same time, the tool can also provide them with references to genes that are closely related to tumor staging, facilitating the development of precision oncology. We review IOAT's main features and demonstrate its analysis capabilities on a lung from TCGA. CONCLUSIONS: IOAT is a local desktop tool, which provides a set of multi-omics data integration solutions. It can quickly perform a complete analysis of cancer genome data for subtype discovery and biomarker identification without security issues and writing any code. Thus, our tool can enable cancer biologists and biomedicine researchers to analyze their data more easily and safely. IOAT can be downloaded for free from https://github.com/WlSunshine/IOAT-software .


Asunto(s)
Neoplasias , Análisis por Conglomerados , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Medicina de Precisión , Programas Informáticos
12.
Retina ; 41(5): 1110-1117, 2021 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-33031250

RESUMEN

PURPOSE: To develop a deep learning (DL) model to detect morphologic patterns of diabetic macular edema (DME) based on optical coherence tomography (OCT) images. METHODS: In the training set, 12,365 OCT images were extracted from a public data set and an ophthalmic center. A total of 656 OCT images were extracted from another ophthalmic center for external validation. The presence or absence of three OCT patterns of DME, including diffused retinal thickening, cystoid macular edema, and serous retinal detachment, was labeled with 1 or 0, respectively. A DL model was trained to detect three OCT patterns of DME. The occlusion test was applied for the visualization of the DL model. RESULTS: Applying 5-fold cross-validation method in internal validation, the area under the receiver operating characteristic curve for the detection of three OCT patterns (i.e., diffused retinal thickening, cystoid macular edema, and serous retinal detachment) was 0.971, 0.974, and 0.994, respectively, with an accuracy of 93.0%, 95.1%, and 98.8%, respectively, a sensitivity of 93.5%, 94.5%, and 96.7%, respectively, and a specificity of 92.3%, 95.6%, and 99.3%, respectively. In external validation, the area under the receiver operating characteristic curve was 0.970, 0.997, and 0.997, respectively, with an accuracy of 90.2%, 95.4%, and 95.9%, respectively, a sensitivity of 80.1%, 93.4%, and 94.9%, respectively, and a specificity of 97.6%, 97.2%, and 96.5%, respectively. The occlusion test showed that the DL model could successfully identify the pathologic regions most critical for detection. CONCLUSION: Our DL model demonstrated high accuracy and transparency in the detection of OCT patterns of DME. These results emphasized the potential of artificial intelligence in assisting clinical decision-making processes in patients with DME.


Asunto(s)
Inteligencia Artificial , Aprendizaje Profundo , Retinopatía Diabética/diagnóstico , Edema Macular/diagnóstico , Tomografía de Coherencia Óptica/métodos , Agudeza Visual , Retinopatía Diabética/complicaciones , Retinopatía Diabética/fisiopatología , Estudios de Seguimiento , Humanos , Edema Macular/etiología , Edema Macular/fisiopatología , Curva ROC , Estudios Retrospectivos
13.
Bioinformatics ; 35(4): 602-610, 2019 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-30052773

RESUMEN

MOTIVATION: The emergence of large amounts of genomic, chemical, and pharmacological data provides new opportunities and challenges. Identifying gene-drug associations is not only crucial in providing a comprehensive understanding of the molecular mechanisms of drug action, but is also important in the development of effective treatments for patients. However, accurately determining the complex associations among pharmacogenomic data remains challenging. We propose a higher order graph matching with multiple network constraints (HOGMMNC) model to accurately identify gene-drug modules. The HOGMMNC model aims to capture the inherent structural relations within data drawn from multiple sources by hypergraph matching. The proposed technique seamlessly integrates prior constraints to enhance the accuracy and reliability of the identified relations. An effective numerical solution is combined with a novel sampling strategy to solve the problem efficiently. RESULTS: The superiority and effectiveness of our proposed method are demonstrated through a comparison with four state-of-the-art techniques using synthetic and empirical data. The experiments on synthetic data show that the proposed method clearly outperforms other methods, especially in the presence of noise and irrelevant samples. The HOGMMNC model identifies eighteen gene-drug modules in the empirical data. The modules are validated to have significant associations via pathway analysis. Significance: The modules identified by HOGMMNC provide new insights into the molecular mechanisms of drug action and provide patients with more effective treatments. Our proposed method can be applied to the study of other biological correlated module identification problems (e.g. miRNA-gene, gene-methylation, and gene-disease). AVAILABILITY AND IMPLEMENTATION: A matlab package of HOGMMNC is available at https://github.com/scutbioinformatics/HOGMMNC/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Interacciones Farmacológicas/genética , Redes Reguladoras de Genes , Genómica , Humanos , Reproducibilidad de los Resultados
14.
Protein Expr Purif ; 164: 105463, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31381990

RESUMEN

Recombinant expression of human membrane proteins in large quantities remains a major challenge. Expression host is an important variable to screen for high-level production of membrane proteins. Using the green fluorescent protein (GFP) as a reporter, we screened the expression of a human multi-pass membrane protein called sterol Δ8-Δ7 isomerase in three different hosts: Escherichia coli, Saccharomyces cerevisiae, and Pichia pastoris. The expression of the His-tagged isomerase was exceptionally high in P. pastoris, reaching ~200 mg L-1 in standard flasks, and ~1,000 mg L-1 in condensed culture that mimics fermentation. The heterogeneously expressed isomerase could be extracted fully with dodecyl maltoside, and the solubilized protein in the form of GFP fusion showed a sharp and symmetric peak on fluorescence-detection size exclusion chromatography. Our work provides a useful source for the purification of the recombinant isomerase.


Asunto(s)
Pichia/genética , Esteroide Isomerasas/química , Esteroide Isomerasas/genética , Cromatografía en Gel , Expresión Génica , Humanos , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Solubilidad
15.
Eur Radiol ; 29(10): 5590-5599, 2019 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-30874880

RESUMEN

OBJECTIVES: To explore and evaluate the feasibility of radiomics in stratifying nasopharyngeal carcinoma (NPC) into distinct survival subgroups through multi-modalities MRI. METHODS: A total of 658 patients (training cohort: 424; validation cohort: 234) with non-metastatic NPC were enrolled in the retrospective analysis. Each slice was considered as a sample and 4863 radiomics features on the tumor region were extracted from T1-weighted, T2-weighted, and contrast-enhanced T1-weighted MRI. Consensus clustering and manual aggregation were performed on the training cohort to generate a baseline model and classification reference used to train a support vector machine classifier. The risk of each patient was defined as the maximum risk among the slices. Each patient in the validation cohort was assigned to the risk model using the trained classifier. Harrell's concordance index (C-index) was used to measure the prognosis performance, and differences between subgroups were compared using the log-rank test. RESULTS: The training cohort was clustered into four groups with distinct survival patterns. Each patient was assigned to one of the four groups according to the estimated risk. Our method gave a performance (C-index = 0.827, p < .004 and C-index = 0.814, p < .002) better than the T-stage (C-index = 0.815, p = .002 and C-index = 0.803, p = .024), competitive to and more stable than the TNM staging system (C-index = 0.842, p = .003 and C-index = 0.765, p = .050) in the training cohort and the validation cohort. CONCLUSIONS: Through investigating a large one-institutional cohort, the quantitative multi-modalities MRI image phenotypes reveal distinct survival subtypes. KEY POINTS: • Radiomics phenotype of MRI revealed the subtype of nasopharyngeal carcinoma (NPC) patients with distinct survival patterns. • The slice-wise analysis method on MRI helps to stratify patients and provides superior prognostic performance over the TNM staging method. • Risk estimation using the highest risk among slices performed better than using the majority risk in prognosis.


Asunto(s)
Carcinoma Nasofaríngeo/diagnóstico por imagen , Neoplasias Nasofaríngeas/diagnóstico por imagen , Adulto , Estudios de Cohortes , Estudios de Factibilidad , Femenino , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Estimación de Kaplan-Meier , Imagen por Resonancia Magnética/métodos , Masculino , Persona de Mediana Edad , Carcinoma Nasofaríngeo/patología , Neoplasias Nasofaríngeas/patología , Estadificación de Neoplasias , Pronóstico , Estudios Retrospectivos , Máquina de Vectores de Soporte
17.
BMC Bioinformatics ; 17: 384, 2016 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-27639558

RESUMEN

BACKGROUND: Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. RESULTS: We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. CONCLUSIONS: Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Distribución Binomial , Análisis por Conglomerados , Simulación por Computador , Humanos , Análisis de Secuencia de ADN
18.
BMC Bioinformatics ; 16: 219, 2015 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-26159165

RESUMEN

BACKGROUND: Classifying cancers by gene selection is among the most important and challenging procedures in biomedicine. A major challenge is to design an effective method that eliminates irrelevant, redundant, or noisy genes from the classification, while retaining all of the highly discriminative genes. RESULTS: We propose a gene selection method, called local hyperplane-based discriminant analysis (LHDA). LHDA adopts two central ideas. First, it uses a local approximation rather than global measurement; second, it embeds a recently reported classification model, K-Local Hyperplane Distance Nearest Neighbor(HKNN) classifier, into its discriminator. Through classification accuracy-based iterations, LHDA obtains the feature weight vector and finally extracts the optimal feature subset. The performance of the proposed method is evaluated in extensive experiments on synthetic and real microarray benchmark datasets. Eight classical feature selection methods, four classification models and two popular embedded learning schemes, including k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), Support Vector Machine (SVM) and Random Forest are employed for comparisons. CONCLUSION: The proposed method yielded comparable to or superior performances to seven state-of-the-art models. The nice performance demonstrate the superiority of combining feature weighting with model learning into an unified framework to achieve the two tasks simultaneously.


Asunto(s)
Análisis por Conglomerados , Análisis Discriminante , Aprendizaje Automático/normas , Neoplasias/clasificación , Neoplasias/genética , Máquina de Vectores de Soporte , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos
19.
BMC Bioinformatics ; 15: 70, 2014 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-24625071

RESUMEN

BACKGROUND: Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. RESULTS: We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). CONCLUSION: Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Máquina de Vectores de Soporte , Algoritmos , Teorema de Bayes , Análisis por Conglomerados , Bases de Datos Genéticas , Análisis Discriminante , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos
20.
BMC Cancer ; 14: 366, 2014 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-24885156

RESUMEN

BACKGROUND: The apparent diffusion coefficient (ADC) is a highly diagnostic factor in discriminating malignant and benign breast masses in diffusion-weighted magnetic resonance imaging (DW-MRI). The combination of ADC and other pictorial characteristics has improved lesion type identification accuracy. The objective of this study was to reassess the findings on an independent patient group by changing the magnetic field from 1.5-Tesla to 3.0-Tesla. METHODS: This retrospective study consisted of a training group of 234 female patients, including 85 benign and 149 malignant lesions, imaged using 1.5-Tesla MRI, and a test group of 95 female patients, including 19 benign and 85 malignant lesions, imaged using 3.0-Tesla MRI. The lesion of interest was segmented from the raw image and four sets of measurements describing the morphology, kinetics, DW-MRI, and texture of the pictorial properties of each lesion were obtained. Each lesion was characterized by 28 features in total. Three classical machine-learning algorithms were used to build prediction models on the training group, which evaluated the prognostic performance of the multi-sided features in three scenarios. To reduce information redundancy, five highly diagnostic factors were selected to obtain a compact yet informative characterization of the lesion status. RESULTS: Three classification models were built on the training of 1.5-Tesla patients and were tested on the independent 3.0-Tesla test group. The following results were found. i) Characterization of breast masses in a multi-sided way dramatically increased prediction performance. The usage of all features gave a higher performance in both sensitivity and specificity than any individual feature groups or their combinations. ii) ADC was a highly effective factor in improving the sensitivity in discriminating malignant from benign masses. iii) Five features, namely ADC, Sum Average, Entropy, Elongation, and Sum Variance, were selected to achieve the highest performance in diagnosis of the 3.0-Tesla patient group. CONCLUSIONS: The combination of ADC and other multi-sided characteristics can increase the capability of discriminating malignant and benign breast lesions, even under different imaging protocols. The selected compact feature subsets achieved a high diagnostic performance and thus are promising in clinical applications for discriminating lesion type and for personalized treatment planning.


Asunto(s)
Neoplasias de la Mama/patología , Medios de Contraste , Imagen de Difusión por Resonancia Magnética , Gadolinio DTPA , Adolescente , Adulto , Anciano , Algoritmos , Inteligencia Artificial , Diagnóstico Diferencial , Femenino , Humanos , Interpretación de Imagen Asistida por Computador , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Pronóstico , Estudios Retrospectivos , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA