Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 180
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37935617

RESUMO

Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Expressão Gênica
2.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35212712

RESUMO

Although sifting functional genes has been discussed for years, traditional selection methods tend to be ineffective in capturing potential specific genes. First, typical methods focus on finding features (genes) relevant to class while irrelevant to each other. However, the features that can offer rich discriminative information are more likely to be the complementary ones. Next, almost all existing methods assess feature relations in pairs, yielding an inaccurate local estimation and lacking a global exploration. In this paper, we introduce multi-variable Area Under the receiver operating characteristic Curve (AUC) to globally evaluate the complementarity among features by employing Area Above the receiver operating characteristic Curve (AAC). Due to AAC, the class-relevant information newly provided by a candidate feature and that preserved by the selected features can be achieved beyond pairwise computation. Furthermore, we propose an AAC-based feature selection algorithm, named Multi-variable AUC-based Combined Features Complementarity, to screen discriminative complementary feature combinations. Extensive experiments on public datasets demonstrate the effectiveness of the proposed approach. Besides, we provide a gene set about prostate cancer and discuss its potential biological significance from the machine learning aspect and based on the existing biomedical findings of some individual genes.


Assuntos
Algoritmos , Aprendizado de Máquina , Área Sob a Curva , Curva ROC
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34874995

RESUMO

The growing expansion of data availability in medical fields could help improve the performance of machine learning methods. However, with healthcare data, using multi-institutional datasets is challenging due to privacy and security concerns. Therefore, privacy-preserving machine learning methods are required. Thus, we use a federated learning model to train a shared global model, which is a central server that does not contain private data, and all clients maintain the sensitive data in their own institutions. The scattered training data are connected to improve model performance, while preserving data privacy. However, in the federated training procedure, data errors or noise can reduce learning performance. Therefore, we introduce the self-paced learning, which can effectively select high-confidence samples and drop high noisy samples to improve the performances of the training model and reduce the risk of data privacy leakage. We propose the federated self-paced learning (FedSPL), which combines the advantage of federated learning and self-paced learning. The proposed FedSPL model was evaluated on gene expression data distributed across different institutions where the privacy concerns must be considered. The results demonstrate that the proposed FedSPL model is secure, i.e. it does not expose the original record to other parties, and the computational overhead during training is acceptable. Compared with learning methods based on the local data of all parties, the proposed model can significantly improve the predicted F1-score by approximately 4.3%. We believe that the proposed method has the potential to benefit clinicians in gene selections and disease prognosis.


Assuntos
Aprendizado de Máquina , Privacidade , Humanos , Projetos de Pesquisa
4.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35037023

RESUMO

Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. Since single-cell data are susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. Therefore, interest in robust gene selection has gained considerable attention in recent years. We introduce sc-REnF [robust entropy based feature (gene) selection method], aiming to leverage the advantages of $R{\prime}{e}nyi$ and $Tsallis$ entropies in gene selection for single cell clustering. Experiments demonstrate that with tuned parameter ($q$), $R{\prime}{e}nyi$ and $Tsallis$ entropies select genes that improved the clustering results significantly, over the other competing methods. sc-REnF can capture relevancy and redundancy among the features of noisy data extremely well due to its robust objective function. Moreover, the selected features/genes can able to determine the unknown cells with a high accuracy. Finally, sc-REnF yields good clustering performance in small sample, large feature scRNA-seq data. Availability: The sc-REnF is available at https://github.com/Snehalikalall/sc-REnF.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise por Conglomerados , Entropia , Perfilação da Expressão Gênica/métodos , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Sequenciamento do Exoma
5.
Genet Med ; 26(5): 101077, 2024 05.
Artigo em Inglês | MEDLINE | ID: mdl-38275146

RESUMO

PURPOSE: Gene selection for genomic newborn screening (gNBS) underpins the validity, acceptability, and ethical application of this technology. Existing gNBS gene lists are highly variable despite being based on shared principles of gene-disease validity, treatability, and age of onset. This study aimed to curate a gNBS gene list that builds upon existing efforts and provide a core consensus list of gene-disease pairs assessed by multiple expert groups worldwide. METHODS: Our multidisciplinary expert team curated a gene list using an open platform and multiple existing curated resources. We included severe treatable disorders with age of disease onset <5 years with established gene-disease associations and reliable variant detection. We compared the final list with published lists from 5 other gNBS projects to determine consensus genes and to identify areas of discrepancy. RESULTS: We reviewed 1279 genes and 604 met our inclusion criteria. Metabolic conditions comprised the largest group (25%), followed by immunodeficiencies (21%) and endocrine disorders (15%). We identified 55 consensus genes included by all 6 gNBS research projects. Common reasons for discrepancy included variable definitions of treatability and strength of gene-disease association. CONCLUSION: We have identified a consensus gene list for gNBS that can be used as a basis for systematic harmonization efforts internationally.


Assuntos
Testes Genéticos , Genômica , Triagem Neonatal , Humanos , Triagem Neonatal/métodos , Recém-Nascido , Testes Genéticos/métodos , Testes Genéticos/normas , Genômica/métodos , Consenso
6.
BMC Biol ; 21(1): 82, 2023 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-37055766

RESUMO

BACKGROUND: Spiders comprise a hyperdiverse lineage of predators with venom systems, yet the origin of functionally novel spider venom glands remains unclear. Previous studies have hypothesized that spider venom glands originated from salivary glands or evolved from silk-producing glands present in early chelicerates. However, there is insufficient molecular evidence to indicate similarity among them. Here, we provide comparative analyses of genome and transcriptome data from various lineages of spiders and other arthropods to advance our understanding of spider venom gland evolution. RESULTS: We generated a chromosome-level genome assembly of a model spider species, the common house spider (Parasteatoda tepidariorum). Module preservation, GO semantic similarity, and differentially upregulated gene similarity analyses demonstrated a lower similarity in gene expressions between the venom glands and salivary glands compared to the silk glands, which questions the validity of the salivary gland origin hypothesis but unexpectedly prefers to support the ancestral silk gland origin hypothesis. The conserved core network in the venom and silk glands was mainly correlated with transcription regulation, protein modification, transport, and signal transduction pathways. At the genetic level, we found that many genes in the venom gland-specific transcription modules show positive selection and upregulated expressions, suggesting that genetic variation plays an important role in the evolution of venom glands. CONCLUSIONS: This research implies the unique origin and evolutionary path of spider venom glands and provides a basis for understanding the diverse molecular characteristics of venom systems.


Assuntos
Artrópodes , Venenos de Aranha , Animais , Transcriptoma , Venenos de Aranha/genética , Evolução Molecular , Genômica , Artrópodes/genética , Glândulas Salivares/metabolismo , Seda/genética , Seda/metabolismo , Filogenia
7.
BMC Bioinformatics ; 24(1): 140, 2023 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-37041456

RESUMO

BACKGROUND: Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS: The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value < 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION: Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans.


Assuntos
Algoritmos , Aprendizado de Máquina
8.
BMC Bioinformatics ; 24(1): 139, 2023 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37031189

RESUMO

BACKGROUND: Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS: This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies-Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS: The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.


Assuntos
Algoritmos , Análise em Microsséries , Neoplasias , Humanos , Perfilação da Expressão Gênica/métodos , Técnicas Genéticas , Análise em Microsséries/métodos , Neoplasias/classificação , Neoplasias/genética , Probabilidade
9.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-34020547

RESUMO

Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS-GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Máquina de Vetores de Suporte , Análise por Conglomerados , Árvores de Decisões , Perfilação da Expressão Gênica/classificação , Ontologia Genética , Humanos , Neoplasias/patologia , Reprodutibilidade dos Testes
10.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33415328

RESUMO

Triple-negative breast cancer (TNBC) has been a challenging breast cancer subtype for oncological therapy. Normally, it can be classified into different molecular subtypes. Accurate and stable classification of the six subtypes is essential for personalized treatment of TNBC. In this study, we proposed a new framework to distinguish the six subtypes of TNBC, and this is one of the handful studies that completed the classification based on mRNA and long noncoding RNA expression data. Particularly, we developed a gene selection approach named DGGA, which takes correlation information between genes into account in the process of measuring gene importance and then effectively removes redundant genes. A gene scoring approach that combined GeneRank scores with gene importance generated by deep neural network (DNN), taking inter-subtype discrimination and inner-gene correlations into account, was came up to improve gene selection performance. More importantly, we embedded a gene connectivity matrix in the DNN for sparse learning, which takes additional consideration with weight changes during training when obtaining the measurement of the relative importance of each gene. Finally, Genetic Algorithm was used to simulate the natural evolutionary process to search for the optimal subset of TNBC subtype classification. We validated the proposed method through cross-validation, and the results demonstrate that it can use fewer genes to obtain more accurate classification results. The implementation for the proposed method is available at https://github.com/RanSuLab/TNBC.


Assuntos
Proteínas de Neoplasias/genética , Redes Neurais de Computação , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Neoplasias de Mama Triplo Negativas/classificação , Neoplasias de Mama Triplo Negativas/genética , Algoritmos , Antineoplásicos/uso terapêutico , Feminino , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Proteínas de Neoplasias/metabolismo , Medicina de Precisão , RNA Longo não Codificante/metabolismo , RNA Mensageiro/metabolismo , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/patologia
11.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32761115

RESUMO

Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.


Assuntos
Algoritmos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Bases de Conhecimento , Biomarcadores/metabolismo , Humanos
12.
J Biomed Inform ; 147: 104510, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37797704

RESUMO

Single-cell RNA sequencing experiments produce data useful to identify different cell types, including uncharacterized and rare ones. This enables us to study the specific functional roles of these cells in different microenvironments and contexts. After identifying a (novel) cell type of interest, it is essential to build succinct marker panels, composed of a few genes referring to cell surface proteins and clusters of differentiation molecules, able to discriminate the desired cells from the other cell populations. In this work, we propose a fully-automatic framework called MAGNETO, which can help construct optimal marker panels starting from a single-cell gene expression matrix and a cell type identity for each cell. MAGNETO builds effective marker panels solving a tailored bi-objective optimization problem, where the first objective regards the identification of the genes able to isolate a specific cell type, while the second conflicting objective concerns the minimization of the total number of genes included in the panel. Our results on three public datasets show that MAGNETO can identify marker panels that identify the cell populations of interest better than state-of-the-art approaches. Finally, by fine-tuning MAGNETO, our results demonstrate that it is possible to obtain marker panels with different specificity levels.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Diferenciação Celular
13.
BMC Bioinformatics ; 23(1): 175, 2022 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-35549644

RESUMO

BACKGROUND: Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot research direction in lung cancer diagnosis in recent years, which provide an effective way of early diagnosis for lung cancer. Thus, building a deep neural network model is of great significance for the early diagnosis of lung cancer. However, the main challenges in mining gene expression datasets are the curse of dimensionality and imbalanced data. The existing methods proposed by some researchers can't address the problems of high-dimensionality and imbalanced data, because of the overwhelming number of variables measured (genes) versus the small number of samples, which result in poor performance in early diagnosis for lung cancer. METHOD: Given the disadvantages of gene expression data sets with small datasets, high-dimensionality and imbalanced data, this paper proposes a gene selection method based on KL divergence, which selects some genes with higher KL divergence as model features. Then build a deep neural network model using Focal Loss as loss function, at the same time, we use k-fold cross validation method to verify and select the best model, we set the value of k is five in this paper. RESULT: The deep learning model method based on KL divergence gene selection proposed in this paper has an AUC of 0.99 on the validation set. The generalization performance of model is high. CONCLUSION: The deep neural network model based on KL divergence gene selection proposed in this paper is proved to be an accurate and effective method for lung cancer prediction.


Assuntos
Aprendizado Profundo , Neoplasias Pulmonares , China , Expressão Gênica , Humanos , Neoplasias Pulmonares/genética , Redes Neurais de Computação
14.
BMC Bioinformatics ; 22(Suppl 12): 436, 2022 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-35057728

RESUMO

BACKGROUND: Clustering and feature selection act major roles in many communities. As a matrix factorization, Low-Rank Representation (LRR) has attracted lots of attentions in clustering and feature selection, but sometimes its performance is frustrated when the data samples are insufficient or contain a lot of noise. RESULTS: To address this drawback, a novel LRR model named TGLRR is proposed by integrating the truncated nuclear norm with graph-Laplacian. Different from the nuclear norm minimizing all singular values, the truncated nuclear norm only minimizes some smallest singular values, which can dispel the harm of shrinkage of the leading singular values. Finally, an efficient algorithm based on Linearized Alternating Direction with Adaptive Penalty is applied to resolving the optimization problem. CONCLUSIONS: The results show that the TGLRR method exceeds the existing state-of-the-art methods in aspect of tumor clustering and gene selection on integrated gene expression data.


Assuntos
Neoplasias , Algoritmos , Análise por Conglomerados , Humanos , Neoplasias/genética
15.
BMC Genomics ; 23(1): 782, 2022 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-36451086

RESUMO

BACKGROUND: The identification of gene regulatory networks (GRNs) facilitates the understanding of the underlying molecular mechanism of various biological processes and complex diseases. With the availability of single-cell RNA sequencing data, it is essential to infer GRNs from single-cell expression. Although some GRN methods originally developed for bulk expression data can be applicable to single-cell data and several single-cell specific GRN algorithms were developed, recent benchmarking studies have emphasized the need of developing more accurate and robust GRN modeling methods that are compatible for single-cell expression data. RESULTS: We present SRGS, SPLS (sparse partial least squares)-based recursive gene selection, to infer GRNs from bulk or single-cell expression data. SRGS recursively selects and scores the genes which may have regulations on the considered target gene based on SPLS. When dealing with gene expression data with dropouts, we randomly scramble samples, set some values in the expression matrix to zeroes, and generate multiple copies of data through multiple iterations to make SRGS more robust. We test SRGS on different kinds of expression data, including simulated bulk data, simulated single-cell data without and with dropouts, and experimental single-cell data, and also compared with the existing GRN methods, including the ones originally developed for bulk data, the ones developed specifically for single-cell data, and even the ones recommended by recent benchmarking studies. CONCLUSIONS: It has been shown that SRGS is competitive with the existing GRN methods and effective in the gene regulatory network inference from bulk or single-cell gene expression data. SRGS is available at: https://github.com/JGuan-lab/SRGS .


Assuntos
Algoritmos , Redes Reguladoras de Genes , Análise dos Mínimos Quadrados , Benchmarking , Sequenciamento do Exoma
16.
Hum Genomics ; 15(1): 66, 2021 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-34753514

RESUMO

BACKGROUND: Nowadays we are observing an explosion of gene expression data with phenotypes. It enables us to accurately identify genes responsible for certain medical condition as well as classify them for drug target. Like any other phenotype data in medical domain, gene expression data with phenotypes also suffer from being a very underdetermined system. In a very large set of features but a very small sample size domain (e.g. DNA microarray, RNA-seq data, GWAS data, etc.), it is often reported that several contrasting feature subsets may yield near equally optimal results. This phenomenon is known as instability. Considering these facts, we have developed a robust and stable supervised gene selection algorithm to select a set of robust and stable genes having a better prediction ability from the gene expression datasets with phenotypes. Stability and robustness is ensured by class and instance level perturbations, respectively. RESULTS: We have performed rigorous experimental evaluations using 10 real gene expression microarray datasets with phenotypes. They reveal that our algorithm outperforms the state-of-the-art algorithms with respect to stability and classification accuracy. We have also performed biological enrichment analysis based on gene ontology-biological processes (GO-BP) terms, disease ontology (DO) terms, and biological pathways. CONCLUSIONS: It is indisputable from the results of the performance evaluations that our proposed method is indeed an effective and efficient supervised gene selection algorithm.


Assuntos
Algoritmos , Aprendizado de Máquina , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fenótipo
17.
Stat Med ; 41(4): 665-680, 2022 02 20.
Artigo em Inglês | MEDLINE | ID: mdl-34773277

RESUMO

The medium-throughput mRNA abundance platform NanoString nCounter has gained great popularity in the past decade, due to its high sensitivity and technical reproducibility as well as remarkable applicability to ubiquitous formalin fixed paraffin embedded (FFPE) tissue samples. Based on RCRnorm developed for normalizing NanoString nCounter data and Bayesian LASSO for variable selection, we propose a fully integrated Bayesian method, called RCRdiff, to detect differentially expressed (DE) genes between different groups of tissue samples (eg, normal and cancer). Unlike existing methods that often require normalization performed beforehand, RCRdiff directly handles raw read counts and jointly models the behaviors of different types of internal controls along with DE and non-DE gene patterns. Doing so would avoid efficiency loss caused by ignoring estimation uncertainty from the normalization step in a sequential approach and thus can offer more reliable statistical inference. We also propose clustering-based strategies for DE gene selection, which do not require any external dataset and are free of any arbitrary cutoff. Empirical evidence of the attractiveness of RCRdiff is demonstrated via extensive simulation and data examples.


Assuntos
Perfilação da Expressão Gênica , Teorema de Bayes , Perfilação da Expressão Gênica/métodos , Humanos , Reprodutibilidade dos Testes
18.
Dev Psychopathol ; 34(1): 295-306, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-32880244

RESUMO

Some Gene × Environment interaction (G×E) research has focused upon single candidate genes, whereas other related work has targeted multiple genes (e.g., polygenic scores). Each approach has informed efforts to identify individuals who are either especially vulnerable to the negative effects of contextual adversity (diathesis stress) or especially susceptible to both positive and negative contextual conditions (differential susceptibility). A critical step in all such molecular G×E research is the selection of genetic variants thought to moderate environmental influences, a subject that has not received a great deal of attention in critiques of G×E research (beyond the observation of small effects of individual genes). Here we conceptually distinguish three phases of G×E work based on the selection of genes presumed to moderate environmental effects and the theoretical basis of such decisions: (a) single candidate genes, (b) composited (multiple) candidate genes, and (c) GWAS-derived polygenic scores. This illustrative, not exhaustive, review makes it clear that implicit or explicit theoretical assumptions inform gene selection in ways that have not been clearly articulated or fully appreciated.


Assuntos
Interação Gene-Ambiente , Herança Multifatorial , Suscetibilidade a Doenças , Humanos
19.
Int J Mol Sci ; 23(2)2022 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-35054922

RESUMO

The development and tissue-dependent color formation of the horticultural plant results in various color pattern flowers. Anthocyanins and carotenoids contribute to the red and yellow colors, respectively. In this study, quantitative real-time polymerase chain reaction (qRT-PCR) is used to analyze the expression profiles of anthocyanin and carotenoids biosynthesis genes in Cymbidium lowianum (Rchb.f.) Rchb.f. Appropriate reference gene selection and validation are required before normalization of gene expression in qRT-PCR analysis. Thus, we firstly selected 12 candidate reference genes from transcriptome data, and used geNorm and Normfinder to evaluate their expression stability in lip (divided into abaxial and adaxial), petal, and sepal of the bud and flower of C. lowianum. Our results show that the two most stable reference genes in different tissues of C. lowianum bud and flower are EF1δ and 60S, the most unstable reference gene is 26S. The expression profiles of the CHS and BCH genes were similar to FPKM value profiles after normalization to the two most stable reference genes, EF1δ and 60S, with the upregulated CHS and BCH expression in flower stage, indicating that the ABP and CBP were activated across the stages of flower development. However, when the most unstable reference gene, 26S, was used to normalize the qRT-PCR data, the expression profiles of CHS and BCH differed from FPKM value profiles, indicating the necessity of selecting stable reference genes. Moreover, CHS and BCH expression was highest in the abaxial lip and adaxial lip, respectively, indicating that the ABP and CBP were activated in abaxial and adaxial lip, respectively, resulting in a presence of red or yellow segments in abaxial and adaxial lip. This study is the first to provide reference genes in C. lowianum, and also provide useful information for studies that aim to understand the molecular mechanisms of flower color formation in C. lowianum.


Assuntos
Flores/genética , Regulação da Expressão Gênica de Plantas , Estudos de Associação Genética , Orchidaceae/genética , Pigmentação/genética , Característica Quantitativa Herdável , Perfilação da Expressão Gênica , Genes de Plantas , Genômica/métodos , Reação em Cadeia da Polimerase em Tempo Real , Transcriptoma
20.
Anal Biochem ; 627: 114242, 2021 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-33974890

RESUMO

This paper introduces a new hybrid approach (DBH) for solving gene selection problem that incorporates the strengths of two existing metaheuristics: binary dragonfly algorithm (BDF) and binary black hole algorithm (BBHA). This hybridization aims to identify a limited and stable set of discriminative genes without sacrificing classification accuracy, whereas most current methods have encountered challenges in extracting disease-related information from a vast amount of redundant genes. The proposed approach first applies the minimum redundancy maximum relevancy (MRMR) filter method to reduce the dimensionality of feature space and then utilizes the suggested hybrid DBH algorithm to determine a smaller set of significant genes. The proposed approach was evaluated on eight benchmark gene expression datasets, and then, was compared against the latest state-of-art techniques to demonstrate algorithm efficiency. The comparative study shows that the proposed approach achieves a significant improvement as compared with existing methods in terms of classification accuracy and the number of selected genes. Moreover, the performance of the suggested method was examined on real RNA-Seq coronavirus-related gene expression data of asthmatic patients for selecting the most significant genes in order to improve the discriminative accuracy of angiotensin-converting enzyme 2 (ACE2). ACE2, as a coronavirus receptor, is a biomarker that helps to classify infected patients from uninfected in order to identify subgroups at risk for COVID-19. The result denotes that the suggested MRMR-DBH approach represents a very promising framework for finding a new combination of most discriminative genes with high classification accuracy.


Assuntos
Algoritmos , COVID-19/diagnóstico , COVID-19/genética , Análise de Sequência de RNA/métodos , Máquina de Vetores de Suporte , Enzima de Conversão de Angiotensina 2 , Perfilação da Expressão Gênica , Humanos , Análise em Microsséries , Neoplasias/diagnóstico , Neoplasias/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA