Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36631399

RESUMO

Due to its promising capacity in improving drug efficacy, polypharmacology has emerged to be a new theme in the drug discovery of complex disease. In the process of novel multi-target drugs (MTDs) discovery, in silico strategies come to be quite essential for the advantage of high throughput and low cost. However, current researchers mostly aim at typical closely related target pairs. Because of the intricate pathogenesis networks of complex diseases, many distantly related targets are found to play crucial role in synergistic treatment. Therefore, an innovational method to develop drugs which could simultaneously target distantly related target pairs is of utmost importance. At the same time, reducing the false discovery rate in the design of MTDs remains to be the daunting technological difficulty. In this research, effective small molecule clustering in the positive dataset, together with a putative negative dataset generation strategy, was adopted in the process of model constructions. Through comprehensive assessment on 10 target pairs with hierarchical similarity-levels, the proposed strategy turned out to reduce the false discovery rate successfully. Constructed model types with much smaller numbers of inhibitor molecules gained considerable yields and showed better false-hit controllability than before. To further evaluate the generalization ability, an in-depth assessment of high-throughput virtual screening on ChEMBL database was conducted. As a result, this novel strategy could hierarchically improve the enrichment factors for each target pair (especially for those distantly related/unrelated target pairs), corresponding to target pair similarity-levels.


Assuntos
Descoberta de Drogas , Polifarmacologia , Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36403090

RESUMO

The label-free quantification (LFQ) has emerged as an exceptional technique in proteomics owing to its broad proteome coverage, great dynamic ranges and enhanced analytical reproducibility. Due to the extreme difficulty lying in an in-depth quantification, the LFQ chains incorporating a variety of transformation, pretreatment and imputation methods are required and constructed. However, it remains challenging to determine the well-performing chain, owing to its strong dependence on the studied data and the diverse possibility of integrated chains. In this study, an R package EVALFQ was therefore constructed to enable a performance evaluation on >3000 LFQ chains. This package is unique in (a) automatically evaluating the performance using multiple criteria, (b) exploring the quantification accuracy based on spiking proteins and (c) discovering the well-performing chains by comprehensive assessment. All in all, because of its superiority in assessing from multiple perspectives and scanning among over 3000 chains, this package is expected to attract broad interests from the fields of proteomic quantification. The package is available at https://github.com/idrblab/EVALFQ.


Assuntos
Proteoma , Proteômica , Proteoma/metabolismo , Proteômica/métodos , Reprodutibilidade dos Testes
3.
Nucleic Acids Res ; 51(D1): D1333-D1344, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36134713

RESUMO

As the most prevalent internal modification in eukaryotic RNAs, N6-methyladenosine (m6A) has been discovered to play an essential role in cellular proliferation, metabolic homeostasis, embryonic development, etc. With the rapid accumulation of research interest in m6A, its crucial roles in the regulations of disease development and drug response are gaining more and more attention. Thus, a database offering such valuable data on m6A-centered regulation is greatly needed; however, no such database is as yet available. Herein, a new database named 'M6AREG' is developed to (i) systematically cover, for the first time, data on the effects of m6A-centered regulation on both disease development and drug response, (ii) explicitly describe the molecular mechanism underlying each type of regulation and (iii) fully reference the collected data by cross-linking to existing databases. Since the accumulated data are valuable for researchers in diverse disciplines (such as pathology and pathophysiology, clinical laboratory diagnostics, medicinal biochemistry and drug design), M6AREG is expected to have many implications for the future conduct of m6A-based regulation studies. It is currently accessible by all users at: https://idrblab.org/m6areg/.


Assuntos
Adenosina , Desenho de Fármacos , Feminino , Gravidez , Humanos , Proliferação de Células , Coleta de Dados , Bases de Dados Factuais
4.
Nucleic Acids Res ; 51(21): e110, 2023 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-37889083

RESUMO

RNAs play essential roles in diverse physiological and pathological processes by interacting with other molecules (RNA/protein/compound), and various computational methods are available for identifying these interactions. However, the encoding features provided by existing methods are limited and the existing tools does not offer an effective way to integrate the interacting partners. In this study, a task-specific encoding algorithm for RNAs and RNA-associated interactions was therefore developed. This new algorithm was unique in (a) realizing comprehensive RNA feature encoding by introducing a great many of novel features and (b) enabling task-specific integration of interacting partners using convolutional autoencoder-directed feature embedding. Compared with existing methods/tools, this novel algorithm demonstrated superior performances in diverse benchmark testing studies. This algorithm together with its source code could be readily accessed by all user at: https://idrblab.org/corain/ and https://github.com/idrblab/corain/.


Assuntos
Biologia Computacional , RNA , RNA/genética , Biologia Computacional/métodos , Algoritmos , Software
5.
J Chem Inf Model ; 64(7): 2720-2732, 2024 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-38373720

RESUMO

In the context of precision medicine, multiomics data integration provides a comprehensive understanding of underlying biological processes and is critical for disease diagnosis and biomarker discovery. One commonly used integration method is early integration through concatenation of multiple dimensionally reduced omics matrices due to its simplicity and ease of implementation. However, this approach is seriously limited by information loss and lack of latent feature interaction. Herein, a novel multiomics early integration framework (MOINER) based on information enhancement and image representation learning is thus presented to address the challenges. MOINER employs the self-attention mechanism to capture the intrinsic correlations of omics-features, which make it significantly outperform the existing state-of-the-art methods for multiomics data integration. Moreover, visualizing the attention embedding and identifying potential biomarkers offer interpretable insights into the prediction results. All source codes and model for MOINER are freely available https://github.com/idrblab/MOINER.


Assuntos
Aprendizagem , Multiômica , Software
6.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32510556

RESUMO

Metaproteomics suffers from the issues of dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) methods were applied to obtain the significant differential subset. So far, a variety of feature selection methods have been developed for metaproteomic study. However, due to FS's performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully selected to obtain the reproducible differential proteins. Moreover, it is critical to evaluate the performance of each FS method according to comprehensive criteria, because the single criterion is not sufficient to reflect the overall performance of the FS method. Therefore, we developed an online tool named MetaFS, which provided 13 types of FS methods and conducted the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In sum, MetaFS could be a distinguished tool for discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.


Assuntos
Bases de Dados de Proteínas , Microbiota , Proteômica , Software , Biomarcadores/metabolismo , Humanos
7.
J Chem Inf Model ; 63(5): 1626-1636, 2023 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-36802582

RESUMO

Drug-drug interactions (DDIs) are a major concern in clinical practice and have been recognized as one of the key threats to public health. To address such a critical threat, many studies have been conducted to clarify the mechanism underlying each DDI, based on which alternative therapeutic strategies are successfully proposed. Moreover, artificial intelligence-based models for predicting DDIs, especially multilabel classification models, are highly dependent on a reliable DDI data set with clear mechanistic information. These successes highlight the imminent necessity to have a platform providing mechanistic clarifications for a large number of existing DDIs. However, no such platform is available yet. In this study, a platform entitled "MecDDI" was therefore introduced to systematically clarify the mechanisms underlying the existing DDIs. This platform is unique in (a) clarifying the mechanisms underlying over 1,78,000 DDIs by explicit descriptions and graphic illustrations and (b) providing a systematic classification for all collected DDIs based on the clarified mechanisms. Due to the long-lasting threats of DDIs to public health, MecDDI could offer medical scientists a clear clarification of DDI mechanisms, support healthcare professionals to identify alternative therapeutics, and prepare data for algorithm scientists to predict new DDIs. MecDDI is now expected as an indispensable complement to the available pharmaceutical platforms and is freely accessible at: https://idrblab.org/mecddi/.


Assuntos
Algoritmos , Inteligência Artificial , Humanos , Interações Medicamentosas
8.
Nucleic Acids Res ; 49(D1): D1233-D1243, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33045737

RESUMO

Drug-metabolizing enzymes (DMEs) are critical determinant of drug safety and efficacy, and the interactome of DMEs has attracted extensive attention. There are 3 major interaction types in an interactome: microbiome-DME interaction (MICBIO), xenobiotics-DME interaction (XEOTIC) and host protein-DME interaction (HOSPPI). The interaction data of each type are essential for drug metabolism, and the collective consideration of multiple types has implication for the future practice of precision medicine. However, no database was designed to systematically provide the data of all types of DME interactions. Here, a database of the Interactome of Drug-Metabolizing Enzymes (INTEDE) was therefore constructed to offer these interaction data. First, 1047 unique DMEs (448 host and 599 microbial) were confirmed, for the first time, using their metabolizing drugs. Second, for these newly confirmed DMEs, all types of their interactions (3359 MICBIOs between 225 microbial species and 185 DMEs; 47 778 XEOTICs between 4150 xenobiotics and 501 DMEs; 7849 HOSPPIs between 565 human proteins and 566 DMEs) were comprehensively collected and then provided, which enabled the crosstalk analysis among multiple types. Because of the huge amount of accumulated data, the INTEDE made it possible to generalize key features for revealing disease etiology and optimizing clinical treatment. INTEDE is freely accessible at: https://idrblab.org/intede/.


Assuntos
Bases de Dados Factuais , Drogas em Investigação/metabolismo , Enzimas/metabolismo , Inativação Metabólica/genética , Medicamentos sob Prescrição/metabolismo , Processamento de Proteína Pós-Traducional , Xenobióticos/metabolismo , Bactérias/enzimologia , Metilação de DNA , Enzimas/classificação , Fungos/enzimologia , Histonas/genética , Histonas/metabolismo , Humanos , Internet , Taxa de Depuração Metabólica , Microbiota/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Software
9.
Brief Bioinform ; 21(4): 1437-1447, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31504150

RESUMO

Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.


Assuntos
Aprendizado Profundo , Proteínas/química , Algoritmos , Redes Neurais de Computação
10.
Brief Bioinform ; 21(5): 1825-1836, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31860715

RESUMO

The type IV bacterial secretion system (SS) is reported to be one of the most ubiquitous SSs in nature and can induce serious conditions by secreting type IV SS effectors (T4SEs) into the host cells. Recent studies mainly focus on annotating new T4SE from the huge amount of sequencing data, and various computational tools are therefore developed to accelerate T4SE annotation. However, these tools are reported as heavily dependent on the selected methods and their annotation performance need to be further enhanced. Herein, a convolution neural network (CNN) technique was used to annotate T4SEs by integrating multiple protein encoding strategies. First, the annotation accuracies of nine encoding strategies integrated with CNN were assessed and compared with that of the popular T4SE annotation tools based on independent benchmark. Second, false discovery rates of various models were systematically evaluated by (1) scanning the genome of Legionella pneumophila subsp. ATCC 33152 and (2) predicting the real-world non-T4SEs validated using published experiments. Based on the above analyses, the encoding strategies, (a) position-specific scoring matrix (PSSM), (b) protein secondary structure & solvent accessibility (PSSSA) and (c) one-hot encoding scheme (Onehot), were identified as well-performing when integrated with CNN. Finally, a novel strategy that collectively considers the three well-performing models (CNN-PSSM, CNN-PSSSA and CNN-Onehot) was proposed, and a new tool (CNN-T4SE, https://idrblab.org/cnnt4se/) was constructed to facilitate T4SE annotation. All in all, this study conducted a comprehensive analysis on the performance of a collection of encoding strategies when integrated with CNN, which could facilitate the suppression of T4SS in infection and limit the spread of antimicrobial resistance.


Assuntos
Redes Neurais de Computação , Sistemas de Secreção Tipo IV , Algoritmos , Matrizes de Pontuação de Posição Específica
11.
Brief Bioinform ; 21(4): 1378-1390, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31197323

RESUMO

Microbial community (MC) has great impact on mediating complex disease indications, biogeochemical cycling and agricultural productivities, which makes metaproteomics powerful technique for quantifying diverse and dynamic composition of proteins or peptides. The key role of biostatistical strategies in MC study is reported to be underestimated, especially the appropriate application of feature selection method (FSM) is largely ignored. Although extensive efforts have been devoted to assessing the performance of FSMs, previous studies focused only on their classification accuracy without considering their ability to correctly and comprehensively identify the spiked proteins. In this study, the performances of 14 FSMs were comprehensively assessed based on two key criteria (both sample classification and spiked protein discovery) using a variety of metaproteomics benchmarks. First, the classification accuracies of those 14 FSMs were evaluated. Then, their abilities in identifying the proteins of different spiked concentrations were assessed. Finally, seven FSMs (FC, LMEB, OPLS-DA, PLS-DA, SAM, SVM-RFE and T-Test) were identified as performing consistently superior or good under both criteria with the PLS-DA performing consistently superior. In summary, this study served as comprehensive analysis on the performances of current FSMs and could provide a valuable guideline for researchers in metaproteomics.


Assuntos
Proteômica/métodos , Biomarcadores/metabolismo , Análise por Conglomerados , Proteínas/metabolismo
12.
J Chem Inf Model ; 62(23): 5875-5895, 2022 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-36378082

RESUMO

Spatial proteomics is an interdisciplinary field that investigates the localization and dynamics of proteins, and it has gained extensive attention in recent years, especially the subcellular proteomics. Numerous evidence indicate that the subcellular localization of proteins is associated with various cellular processes and disease progression. Mass spectrometry (MS)-based and imaging-based experimental approaches have been developed to acquire large-scale spatial proteomic data. To allow the reliable analysis of increasingly complex spatial proteomics data, machine learning (ML) methods have been widely used in both MS-based and imaging-based spatial proteomic data analysis pipelines. Here, we comprehensively survey the applications of ML in spatial proteomics from following aspects: (1) data resources for spatial proteome are comprehensively introduced; (2) the roles of different ML algorithms in data analysis pipelines are elaborated; (3) successful applications of spatial proteomics and several analytical tools integrating ML methods are presented; (4) challenges existing in modern ML-based spatial proteomics studies are discussed. This review provides guidelines for researchers seeking to apply ML methods to analyze spatial proteomic data and can facilitate insightful understanding of cell biology as well as the future research in medical and drug discovery communities.


Assuntos
Proteoma , Proteômica , Proteômica/métodos , Proteoma/metabolismo , Espectrometria de Massas/métodos , Aprendizado de Máquina , Algoritmos
13.
Mol Cell Proteomics ; 18(8): 1683-1699, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31097671

RESUMO

The label-free proteome quantification (LFQ) is multistep workflow collectively defined by quantification tools and subsequent data manipulation methods that has been extensively applied in current biomedical, agricultural, and environmental studies. Despite recent advances, in-depth and high-quality quantification remains extremely challenging and requires the optimization of LFQs by comparatively evaluating their performance. However, the evaluation results using different criteria (precision, accuracy, and robustness) vary greatly, and the huge number of potential LFQs becomes one of the bottlenecks in comprehensively optimizing proteome quantification. In this study, a novel strategy, enabling the discovery of the LFQs of simultaneously enhanced performance from thousands of workflows (integrating 18 quantification tools with 3,128 manipulation chains), was therefore proposed. First, the feasibility of achieving simultaneous improvement in the precision, accuracy, and robustness of LFQ was systematically assessed by collectively optimizing its multistep manipulation chains. Second, based on a variety of benchmark datasets acquired by various quantification measurements of different modes of acquisition, this novel strategy successfully identified a number of manipulation chains that simultaneously improved the performance across multiple criteria. Finally, to further enhance proteome quantification and discover the LFQs of optimal performance, an online tool (https://idrblab.org/anpela/) enabling collective performance assessment (from multiple perspectives) of the entire LFQ workflow was developed. This study confirmed the feasibility of achieving simultaneous improvement in precision, accuracy, and robustness. The novel strategy proposed and validated in this study together with the online tool might provide useful guidance for the research field requiring the mass-spectrometry-based LFQ technique.


Assuntos
Proteômica/métodos , Proteoma , Software , Fluxo de Trabalho
14.
Sensors (Basel) ; 20(24)2020 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-33322548

RESUMO

Robot control based on visual information perception is a hot topic in the industrial robot domain and makes robots capable of doing more things in a complex environment. However, complex visual background in an industrial environment brings great difficulties in recognizing the target image, especially when a target is small or far from the sensor. Therefore, target recognition is the first problem that should be addressed in a visual servo system. This paper considers common complex constraints in industrial environments and proposes a You Only Look Once Version 2 Region of Interest (YOLO-v2-ROI) neural network image processing algorithm based on machine learning. The proposed algorithm combines the advantages of YOLO (You Only Look Once) rapid detection with effective identification of ROI (Region of Interest) pooling structure, which can quickly locate and identify different objects in different fields of view. This method can also lead the robot vision system to recognize and classify a target object automatically, improve robot vision system efficiency, avoid blind movement, and reduce the calculation load. The proposed algorithm is verified by experiments. The experimental result shows that the learning algorithm constructed in this paper has real-time image-detection speed and demonstrates strong adaptability and recognition ability when processing images with complex backgrounds, such as different backgrounds, lighting, or perspectives. In addition, this algorithm can also effectively identify and locate visual targets, which improves the environmental adaptability of a visual servo system.

15.
Int J Mol Sci ; 20(1)2019 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-30609812

RESUMO

Pituitary adenoma (PA) is prevalent in the general population. Due to its severe complications and aggressive infiltration into the surrounding brain structure, the effective management of PA is required. Till now, no drug has been approved for treating non-functional PA, and the removal of cancerous cells from the pituitary is still under experimental investigation. Due to its superior specificity and safety profile, immunotherapy stands as one of the most promising strategies for dealing with PA refractory to the standard treatment, and various studies have been carried out to discover immune-related gene markers as target candidates. However, the lists of gene markers identified among different studies are reported to be highly inconsistent because of the greatly limited number of samples analyzed in each study. It is thus essential to substantially enlarge the sample size and comprehensively assess the robustness of the identified immune-related gene markers. Herein, a novel strategy of direct data integration (DDI) was proposed to combine available PA microarray datasets, which significantly enlarged the sample size. First, the robustness of the gene markers identified by DDI strategy was found to be substantially enhanced compared with that of previous studies. Then, the DDI of all reported PA-related microarray datasets were conducted to achieve a comprehensive identification of PA gene markers, and 66 immune-related genes were discovered as target candidates for PA immunotherapy. Finally, based on the analysis of human protein⁻protein interaction network, some promising target candidates (GAL, LMO4, STAT3, PD-L1, TGFB and TGFBR3) were proposed for PA immunotherapy. The strategy proposed together with the immune-related markers identified in this study provided a useful guidance for the development of novel immunotherapy for PA.


Assuntos
Adenoma/terapia , Biomarcadores Tumorais/genética , Imunoterapia , Neoplasias Hipofisárias/terapia , Proteínas Adaptadoras de Transdução de Sinal/genética , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Adenoma/metabolismo , Área Sob a Curva , Biomarcadores Tumorais/metabolismo , Regulação para Baixo , Galanina/genética , Galanina/metabolismo , Regulação Neoplásica da Expressão Gênica , Humanos , Proteínas com Domínio LIM/genética , Proteínas com Domínio LIM/metabolismo , Neoplasias Hipofisárias/metabolismo , Mapas de Interação de Proteínas/genética , Curva ROC , Fator de Transcrição STAT3/genética , Fator de Transcrição STAT3/metabolismo , Regulação para Cima
16.
Front Oncol ; 14: 1366546, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38803530

RESUMO

Myofibroblastic sarcoma is a malignancy in which myofibroblasts are the main component, with a very low incidence. In this study, we report a case of low-grade myofibroblastic sarcoma (LGMS) in the breast. After the diagnosis of LGMS, the patient received a mastectomy. The patient showed no relapse or progression during the follow-up time of 3 months following the operation. LGMS in the breast is extremely rare, and the limited experience with its diagnosis and treatment brings obstacles to doctors. Therefore, this report summarizes the preoperative diagnosis, treatment, and prognosis of breast LGMS through a literature review.

17.
Curr Drug Targets ; 21(1): 34-54, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31433754

RESUMO

BACKGROUND: Due to its prevalence and negative impacts on both the economy and society, the diabetes mellitus (DM) has emerged as a worldwide concern. In light of this, the label-free quantification (LFQ) proteomics and diabetic marker selection methods have been applied to elucidate the underlying mechanisms associated with insulin resistance, explore novel protein biomarkers, and discover innovative therapeutic protein targets. OBJECTIVE: The purpose of this manuscript is to review and analyze the recent computational advances and development of label-free quantification and diabetic marker selection in diabetes proteomics. METHODS: Web of Science database, PubMed database and Google Scholar were utilized for searching label-free quantification, computational advances, feature selection and diabetes proteomics. RESULTS: In this study, we systematically review the computational advances of label-free quantification and diabetic marker selection methods which were applied to get the understanding of DM pathological mechanisms. Firstly, different popular quantification measurements and proteomic quantification software tools which have been applied to the diabetes studies are comprehensively discussed. Secondly, a number of popular manipulation methods including transformation, pretreatment (centering, scaling, and normalization), missing value imputation methods and a variety of popular feature selection techniques applied to diabetes proteomic data are overviewed with objective evaluation on their advantages and disadvantages. Finally, the guidelines for the efficient use of the computationbased LFQ technology and feature selection methods in diabetes proteomics are proposed. CONCLUSION: In summary, this review provides guidelines for researchers who will engage in proteomics biomarker discovery and by properly applying these proteomic computational advances, more reliable therapeutic targets will be found in the field of diabetes mellitus.


Assuntos
Biomarcadores/análise , Diabetes Mellitus/metabolismo , Proteômica/métodos , Algoritmos , Biologia Computacional/métodos , Bases de Dados Bibliográficas , Diabetes Mellitus/etiologia , Guias como Assunto , Humanos , Proteoma/análise , Software
18.
Comput Struct Biotechnol J ; 18: 2012-2025, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32802273

RESUMO

Cancer proteomics has become a powerful technique for characterizing the protein markers driving transformation of malignancy, tracing proteome variation triggered by therapeutics, and discovering the novel targets and drugs for the treatment of oncologic diseases. To facilitate cancer diagnosis/prognosis and accelerate drug target discovery, a variety of methods for tumor marker identification and sample classification have been developed and successfully applied to cancer proteomic studies. This review article describes the most recent advances in those various approaches together with their current applications in cancer-related studies. Firstly, a number of popular feature selection methods are overviewed with objective evaluation on their advantages and disadvantages. Secondly, these methods are grouped into three major classes based on their underlying algorithms. Finally, a variety of sample separation algorithms are discussed. This review provides a comprehensive overview of the advances on tumor maker identification and patients/samples/tissues separations, which could be guidance to the researches in cancer proteomics.

19.
J Mol Biol ; 432(11): 3411-3421, 2020 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-32044343

RESUMO

Comparative biological studies typically require plenty of samples to ensure full representation of the given problem. A frequently-encountered question is how many samples are sufficient for a particular study. This question is traditionally assessed using the statistical power, but it alone may not guarantee the full and reproducible discovery of features truly discriminating biological groups. Two new types of statistical criteria have thus been introduced to assess sample sufficiency from different perspectives by considering diagnostic accuracy and robustness. Due to the complementary nature of these criteria, a comprehensive evaluation based on all criteria is necessary for achieving a more accurate assessment. However, no such tool is available yet. Herein, an online tool SSizer (https://idrblab.org/ssizer/) was developed and validated to enable the assessment of the sample sufficiency for a user-input biological dataset, and three statistical criteria were adopted to achieve comprehensive and collective assessment. A sample simulation based on a user-input dataset was performed to expand the data and then determine the sample size required by the particular study. In sum, SSizer is unique for its ability to comprehensively evaluate whether the sample size is sufficient and determine the required number of samples for the user-input dataset, which, therefore, facilitates the comparative and OMIC-based biological studies.


Assuntos
Biologia Computacional/estatística & dados numéricos , Genômica/estatística & dados numéricos , Tamanho da Amostra , Software , Animais , Simulação por Computador , Humanos , Internet
20.
CNS Neurosci Ther ; 25(9): 1054-1063, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31350824

RESUMO

AIMS: As one of the most fundamental questions in modern science, "what causes schizophrenia (SZ)" remains a profound mystery due to the absence of objective gene markers. The reproducibility of the gene signatures identified by independent studies is found to be extremely low due to the incapability of available feature selection methods and the lack of measurement on validating signatures' robustness. These irreproducible results have significantly limited our understanding of the etiology of SZ. METHODS: In this study, a new feature selection strategy was developed, and a comprehensive analysis was then conducted to ensure a reliable signature discovery. Particularly, the new strategy (a) combined multiple randomized sampling with consensus scoring and (b) assessed gene ranking consistency among different datasets, and a comprehensive analysis among nine independent studies was conducted. RESULTS: Based on a first-ever evaluation of methods' reproducibility that was cross-validated by nine independent studies, the newly developed strategy was found to be superior to the traditional ones. As a result, 33 genes were consistently identified from multiple datasets by the new strategy as differentially expressed, which might facilitate our understanding of the mechanism underlying the etiology of SZ. CONCLUSION: A new strategy capable of enhancing the reproducibility of feature selection in current SZ research was successfully constructed and validated. A group of candidate genes identified in this study should be considered as great potential for revealing the etiology of SZ.


Assuntos
Inteligência Artificial/normas , Bases de Dados Genéticas/normas , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Esquizofrenia/genética , Humanos , Distribuição Aleatória , Reprodutibilidade dos Testes , Esquizofrenia/diagnóstico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA