Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 29
Filtrer
1.
Sci Rep ; 14(1): 16766, 2024 07 21.
Article de Anglais | MEDLINE | ID: mdl-39034310

RÉSUMÉ

The tumor microenvironment (TME) plays a pivotal role in the onset, progression, and treatment response of cancer. Among the various components of the TME, cancer-associated fibroblasts (CAFs) are key regulators of both immune and non-immune cellular functions. Leveraging single-cell RNA sequencing (scRNA) data, we have uncovered previously hidden and promising roles within this specific CAF subgroup, paving the way for its clinical application. However, several critical questions persist, primarily stemming from the heterogeneous nature of CAFs and the use of different fibroblast markers in various sample analyses, causing confusion and hindrance in their clinical implementation. In this groundbreaking study, we have systematically screened multiple databases to identify the most robust marker for distinguishing CAFs in lung cancer, with a particular focus on their potential use in early diagnosis, staging, and treatment response evaluation. Our investigation revealed that COL1A1, COL1A2, FAP, and PDGFRA are effective markers for characterizing CAF subgroups in most lung adenocarcinoma datasets. Through comprehensive analysis of treatment responses, we determined that COL1A1 stands out as the most effective indicator among all CAF markers. COL1A1 not only deciphers the TME signatures related to CAFs but also demonstrates a highly sensitive and specific correlation with treatment responses and multiple survival outcomes. For the first time, we have unveiled the distinct roles played by clusters of CAF markers in differentiating various TME groups. Our findings confirm the sensitive and unique contributions of CAFs to the responses of multiple lung cancer therapies. These insights significantly enhance our understanding of TME functions and drive the translational application of extensive scRNA sequence results. COL1A1 emerges as the most sensitive and specific marker for defining CAF subgroups in scRNA analysis. The CAF ratios represented by COL1A1 can potentially serve as a reliable predictor of treatment responses in clinical practice, thus providing valuable insights into the influential roles of TME components. This research marks a crucial step forward in revolutionizing our approach to cancer diagnosis and treatment.


Sujet(s)
Marqueurs biologiques tumoraux , Fibroblastes associés au cancer , Carcinome pulmonaire non à petites cellules , Tumeurs du poumon , Microenvironnement tumoral , Humains , Tumeurs du poumon/mortalité , Tumeurs du poumon/anatomopathologie , Tumeurs du poumon/génétique , Tumeurs du poumon/diagnostic , Tumeurs du poumon/thérapie , Carcinome pulmonaire non à petites cellules/mortalité , Carcinome pulmonaire non à petites cellules/anatomopathologie , Carcinome pulmonaire non à petites cellules/génétique , Carcinome pulmonaire non à petites cellules/diagnostic , Fibroblastes associés au cancer/métabolisme , Fibroblastes associés au cancer/anatomopathologie , Marqueurs biologiques tumoraux/métabolisme , Pronostic , Régulation de l'expression des gènes tumoraux
2.
Biology (Basel) ; 12(10)2023 Sep 25.
Article de Anglais | MEDLINE | ID: mdl-37886990

RÉSUMÉ

Microsatellites are polymorphic and cost-effective. Optimizing reduced microsatellite panels using heuristic algorithms eases budget constraints in genetic diversity and population genetic assessments. Microsatellite marker efficiency is strongly associated with its polymorphism and is quantified as the polymorphic information content (PIC). Nevertheless, marker selection cannot rely solely on PIC. In this study, the ant colony optimization (ACO) algorithm, a widely recognized optimization method, was adopted to create an enhanced selection scheme for refining microsatellite marker panels, called the PIC-ACO selection scheme. The algorithm was fine-tuned and validated using extensive datasets of chicken (Gallus gallus) and Chinese gorals (Naemorhedus griseus) from our previous studies. In contrast to basic optimization algorithms that stochastically initialize potential outputs, our selection algorithm utilizes the PIC values of markers to prime the ACO process. This increases the global solution discovery speed while reducing the likelihood of becoming trapped in local solutions. This process facilitated the acquisition of a cost-efficient and optimized microsatellite marker panel for studying genetic diversity and population genetic datasets. The established microsatellite efficiency metrics such as PIC, allele richness, and heterozygosity were correlated with the actual effectiveness of the microsatellite marker panel. This approach could substantially reduce budgetary barriers to population genetic assessments, breeding, and conservation programs.

3.
J Sci Food Agric ; 103(9): 4704-4718, 2023 Jul.
Article de Anglais | MEDLINE | ID: mdl-36924039

RÉSUMÉ

BACKGROUND: This study investigated the geographical origin classification of green coffee beans from continental to country and regional levels. An innovative approach combined stable isotope and trace element analyses with non-linear machine learning data analysis to improve coffee origin classification and marker selection. Specialty green coffee beans sourced from three continents, eight countries, and 22 regions were analyzed by measuring five isotope ratios (δ13 C, δ15 N, δ18 O, δ2 H, and δ34 S) and 41 trace elements. Partial least squares discriminant analysis (PLS-DA) was applied to the integrated dataset for origin classification. RESULTS: Origins were predicted well at the country level and showed promise at the regional level, with discriminating marker selection at all levels. However, PLS-DA predicted origin poorly at the continental and Central American regional levels. Non-linear machine learning techniques improved predictions and enabled the identification of a higher number of origin markers, and those that were identified were more relevant. The best predictive accuracy was found using ensemble decision trees, random forest and extreme gradient boost, with accuracies of up to 0.94 and 0.89 for continental and Central American regional models, respectively. CONCLUSION: The potential for advanced machine learning models to improve origin classification and the identification of relevant origin markers was demonstrated. The decision-tree-based models were superior with their embedded variable identification features and visual interpretation. © 2023 The Authors. Journal of The Science of Food and Agriculture published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.


Sujet(s)
Apprentissage machine , Isotopes/composition chimique , Oligoéléments/composition chimique , Dynamique non linéaire , Café/composition chimique
4.
Genes (Basel) ; 13(11)2022 11 19.
Article de Anglais | MEDLINE | ID: mdl-36421838

RÉSUMÉ

An assessment of the genetic diversity and structure of a population is essential for designing recovery plans for threatened species. Italy hosts two brown bear populations, Ursus arctos marsicanus (Uam), endemic to the Apennines of central Italy, and Ursus arctos arctos (Uaa), in the Italian Alps. Both populations are endangered and occasionally involved in human-wildlife conflict; thus, detailed management plans have been in place for several decades, including genetic monitoring. Here, we propose a simple cost-effective microsatellite-based protocol for the management of populations with low genetic variation. We sampled 22 Uam and 22 Uaa individuals and analyzed a total of 32 microsatellite loci in order to evaluate their applicability in individual identification. Based on genetic variability estimates, we compared data from four different STR marker sets, to evaluate the optimal settings in long-term monitoring projects. Allelic richness and gene diversity were the highest for the Uaa population, whereas depleted genetic variability was noted for the Uam population, which should be regarded as a conservation priority. Our results identified the most effective STR sets for the estimation of genetic diversity and individual discrimination in Uam (9 loci, PIC 0.45; PID 2.0 × 10-5), and Uaa (12 loci, PIC 0.64; PID 6.9 × 10-11) populations, which can easily be utilized by smaller laboratories to support local governments in regular population monitoring. The method we proposed to select the most variable markers could be adopted for the genetic characterization of other small and isolated populations.


Sujet(s)
Ursidae , Animaux , Allèles , Italie , Répétitions microsatellites/génétique , Ursidae/génétique
5.
BMC Bioinformatics ; 23(1): 316, 2022 Aug 04.
Article de Anglais | MEDLINE | ID: mdl-35927623

RÉSUMÉ

BACKGROUND: ImputAccur is a software tool to measure genotype-imputation accuracy. Imputation of untyped markers is a standard approach in genome-wide association studies to close the gap between directly genotyped and other known DNA variants. However, high accuracy for imputed genotypes is fundamental. Several accuracy measures have been proposed, but unfortunately, they are implemented on different platforms, which is impractical. RESULTS: With ImputAccur, the accuracy measures info, Iam-hiQ and r2-based indices can be derived from standard output files of imputation software. Sample/probe and marker filtering is possible. This allows e.g. accurate marker filtering ahead of data analysis. CONCLUSIONS: The source code (Python version 3.9.4), a standalone executive file, and example data for ImputAccur are freely available at https://gitlab.gwdg.de/kolja.thormann1/imputationquality.git .


Sujet(s)
Étude d'association pangénomique , Polymorphisme de nucléotide simple , Génotype , Logiciel
6.
J Dairy Sci ; 104(6): 6897-6908, 2021 Jun.
Article de Anglais | MEDLINE | ID: mdl-33685702

RÉSUMÉ

The addition of cattle health and immunity traits to genomic selection indices holds promise to increase individual animal longevity and productivity, and decrease economic losses from disease. However, highly variable genomic loci that contain multiple immune-related genes were poorly assembled in the first iterations of the cattle reference genome assembly and underrepresented during the development of most commercial genotyping platforms. As a consequence, there is a paucity of genetic markers within these loci that may track haplotypes related to disease susceptibility. By using hierarchical assembly of bacterial artificial chromosome inserts spanning 3 of these immune-related gene regions, we were able to assemble multiple full-length haplotypes of the major histocompatibility complex, the leukocyte receptor complex, and the natural killer cell complex. Using these new assemblies and the recently released ARS-UCD1.2 reference, we aligned whole-genome shotgun reads from 125 sequenced Holstein bulls to discover candidate variants for genetic marker development. We selected 124 SNPs, using heuristic and statistical models to develop a custom genotyping panel. In a proof-of-principle study, we used this custom panel to genotype 1,797 Holstein cows exposed to bovine tuberculosis (bTB) that were the subject of a previous GWAS study using the Illumina BovineHD array. Although we did not identify any significant association of bTB phenotypes with these new genetic markers, 2 markers exhibited substantial effects on bTB phenotypic prediction. The models and parameters trained in this study serve as a guide for future marker discovery surveys particularly in previously unassembled regions of the cattle genome.


Sujet(s)
Complexe antigène-anticorps , Génome , Animaux , Bovins/génétique , Femelle , Étude d'association pangénomique/médecine vétérinaire , Génomique , Génotype , Mâle , Polymorphisme de nucléotide simple/génétique
7.
J Dairy Sci ; 104(4): 4478-4485, 2021 Apr.
Article de Anglais | MEDLINE | ID: mdl-33612229

RÉSUMÉ

Marker sets used in US dairy genomic predictions were previously expanded by including high-density (HD) or sequence markers with the largest effects for Holstein breed only. Other non-Holstein breeds lacked enough HD genotyped animals to be used as a reference population at that time, and thus were not included in the genomic prediction. Recently, numbers of non-Holstein breeds genotyped using HD panels reached an acceptable level for imputation and marker selection, allowing HD genomic prediction and HD marker selection for Holstein plus 4 other breeds. Genotypes for 351,461 Holsteins, 347,570 Jerseys, 42,346 Brown Swiss, 9,364 Ayrshires (including Red dairy cattle), and 4,599 Guernseys were imputed to the HD marker list that included 643,059 SNP. The separate HD reference populations included Illumina BovineHD (San Diego, CA) genotypes for 4,012 Holsteins, 407 Jerseys, 181 Brown Swiss, 527 Ayrshires, and 147 Guernseys. The 643,059 variants included the HD SNP and all 79,254 (80K) genetic markers and QTL used in routine national genomic evaluations. Before imputation, approximately 91 to 97% of genotypes were unknown for each breed; after imputation, 1.1% of Holstein, 3.2% of Jersey, 6.7% of Brown Swiss, 4.8% of Ayrshire, and 4.2% of Guernsey alleles remained unknown due to lower density haplotypes that had no matching HD haplotype. The higher remaining missing rates in non-Holstein breeds are mainly due to fewer HD genotyped animals in the imputation reference populations. Allele effects for up to 39 traits were estimated separately within each breed using phenotypic reference populations that included up to 6,157 Jersey males and 110,130 Jersey females. Correlations of HD with 80K genomic predictions for young animals averaged 0.986, 0.989, 0.985, 0.992, and 0.978 for Jersey, Ayrshire, Brown Swiss, Guernsey, and Holstein breeds, respectively. Correlations were highest for yield traits (about 0.991) and lowest for foot angle and rear legs-side view (0.981and 0.982, respectively). Some HD effects were more than twice as large as the largest 80K SNP effect, and HD markers had larger effects than nearby 80K markers for many breed-trait combinations. Previous studies selected and included markers with large effects for Holstein traits; the newly selected HD markers should also improve non-Holstein and crossbred genomic predictions and were added to official US genomic predictions in April 2020.


Sujet(s)
Génomique , Polymorphisme de nucléotide simple , Animaux , Bovins/génétique , Femelle , Génotype , Guernesey , Mâle , Phénotype , Polymorphisme de nucléotide simple/génétique
8.
BMC Bioinformatics ; 21(1): 477, 2020 Oct 23.
Article de Anglais | MEDLINE | ID: mdl-33097004

RÉSUMÉ

BACKGROUND: High throughput microfluidic protocols in single cell RNA sequencing (scRNA-seq) collect mRNA counts from up to one million individual cells in a single experiment; this enables high resolution studies of rare cell types and cell development pathways. Determining small sets of genetic markers that can identify specific cell populations is thus one of the major objectives of computational analysis of mRNA counts data. Many tools have been developed for marker selection on single cell data; most of them, however, are based on complex statistical models and handle the multi-class case in an ad-hoc manner. RESULTS: We introduce RANKCORR, a fast method with strong mathematical underpinnings that performs multi-class marker selection in an informed manner. RANKCORR proceeds by ranking the mRNA counts data before linearly separating the ranked data using a small number of genes. The step of ranking is intuitively natural for scRNA-seq data and provides a non-parametric method for analyzing count data. In addition, we present several performance measures for evaluating the quality of a set of markers when there is no known ground truth. Using these metrics, we compare the performance of RANKCORR to a variety of other marker selection methods on an assortment of experimental and synthetic data sets that range in size from several thousand to one million cells. CONCLUSIONS: According to the metrics introduced in this work, RANKCORR is consistently one of most optimal marker selection methods on scRNA-seq data. Most methods show similar overall performance, however; thus, the speed of the algorithm is the most important consideration for large data sets (and comparing the markers selected by several methods can be fruitful). RANKCORR is fast enough to easily handle the largest data sets and, as such, it is a useful tool to add into computational pipelines when dealing with high throughput scRNA-seq data. RANKCORR software is available for download at https://github.com/ahsv/RankCorr with extensive documentation.


Sujet(s)
Bases de données génétiques , Séquençage nucléotidique à haut débit , Analyse sur cellule unique , Algorithmes , Animaux , Séquence nucléotidique , Cellules de la moelle osseuse/métabolisme , Analyse de regroupements , Simulation numérique , Analyse de profil d'expression de gènes , Marqueurs génétiques , Humains , Souris , Courbe ROC , Logiciel
9.
Bayesian Anal ; 15(1): 79-102, 2020 Mar.
Article de Anglais | MEDLINE | ID: mdl-32802246

RÉSUMÉ

Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA).

10.
Comput Struct Biotechnol J ; 18: 2012-2025, 2020.
Article de Anglais | MEDLINE | ID: mdl-32802273

RÉSUMÉ

Cancer proteomics has become a powerful technique for characterizing the protein markers driving transformation of malignancy, tracing proteome variation triggered by therapeutics, and discovering the novel targets and drugs for the treatment of oncologic diseases. To facilitate cancer diagnosis/prognosis and accelerate drug target discovery, a variety of methods for tumor marker identification and sample classification have been developed and successfully applied to cancer proteomic studies. This review article describes the most recent advances in those various approaches together with their current applications in cancer-related studies. Firstly, a number of popular feature selection methods are overviewed with objective evaluation on their advantages and disadvantages. Secondly, these methods are grouped into three major classes based on their underlying algorithms. Finally, a variety of sample separation algorithms are discussed. This review provides a comprehensive overview of the advances on tumor maker identification and patients/samples/tissues separations, which could be guidance to the researches in cancer proteomics.

11.
Anim Cells Syst (Seoul) ; 24(6): 321-328, 2020 Dec 24.
Article de Anglais | MEDLINE | ID: mdl-33456716

RÉSUMÉ

Despite the various existing studies about nonsynonymous single nucleotide polymorphisms (nsSNPs), genome-wide studies based on nsSNPs are rare. NsSNPs alter amino acid sequences, affect protein structure and function, and have deleterious effects. By predicting the deleterious effect of nsSNPs, we determined the total risk score per individual. Additionally, the machine learning technique was utilized to find an optimal nsSNP subset that best explains the complete nsSNP effect. A total of 16,100 nsSNPs were selected as the best representatives among 89,519 regressed nsSNPs. In the gene ontology analysis encompassing the 16,100 nsSNPs, DNA metabolic process, chemokine- and immune-related, and reproduction were the most enriched terms. We expect that our risk score prediction and nsSNP marker selection will contribute to future development of extant genome-wide association studies and breeding science more broadly.

12.
Mol Biol Evol ; 37(3): 904-922, 2020 03 01.
Article de Anglais | MEDLINE | ID: mdl-31710677

RÉSUMÉ

Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.


Sujet(s)
Biologie informatique/méthodes , Reptiles/génétique , Séquençage du génome entier/méthodes , Animaux , Théorème de Bayes , Évolution moléculaire , Exons , Locus génétiques , Phylogenèse , Reptiles/classification , Alignement de séquences
13.
Mol Neurodegener ; 14(1): 41, 2019 11 14.
Article de Anglais | MEDLINE | ID: mdl-31727120

RÉSUMÉ

The adoption of CRISPR-Cas9 technology for functional genetic screens has been a transformative advance. Due to its modular nature, this technology can be customized to address a myriad of questions. To date, pooled, genome-scale studies have uncovered genes responsible for survival, proliferation, drug resistance, viral susceptibility, and many other functions. The technology has even been applied to the functional interrogation of the non-coding genome. However, applications of this technology to neurological diseases remain scarce. This shortfall motivated the assembly of a review that will hopefully help researchers moving in this direction find their footing. The emphasis here will be on design considerations and concepts underlying this methodology. We will highlight groundbreaking studies in the CRISPR-Cas9 functional genetics field and discuss strengths and limitations of this technology for neurological disease applications. Finally, we will provide practical guidance on navigating the many choices that need to be made when implementing a CRISPR-Cas9 functional genetic screen for the study of neurological diseases.


Sujet(s)
Systèmes CRISPR-Cas/génétique , Clustered regularly interspaced short palindromic repeats/génétique , Édition de gène , Maladies neurodégénératives/génétique , Animaux , Modèles animaux de maladie humaine , Dépistage génétique/méthodes , Humains
14.
Proteomics Clin Appl ; 13(2): e1800091, 2019 03.
Article de Anglais | MEDLINE | ID: mdl-30680934

RÉSUMÉ

There is a need for accurate, robust, non-invasive methods to provide early diagnosis of graft lesions after kidney transplantation. A multitude of proteomic biomarkers for the major kidney allograft disease phenotypes defined by the BANFF classification criteria have been described in literature. None of these biomarkers have been established in the clinic. A key reason for this is the lack of clinical validation which is difficult, as even the gold standard of diagnosis, kidney biopsy, is often ambiguous. The semantic clustering by ReviGO on top of transcriptomic pathway analysis is evaluated to connect histological and transcriptomic kidney allograft disease characteristics with proteomic biomarker qualification. By using public data generated in microarray studies of kidney allograft tissue, biological processes and key molecules specifically associated with the different kidney allograft disease phenotypes are identified. Semantic clustering holds the promise to guide adaptation of proteomic marker panels to molecular pathology. This can support the development of noninvasive tests (e.g. in urine, by capillary electrophoresis mass spectrometry) that simultaneously detect diverse kidney allograft phenotypes with high accuracy and sensitivity.


Sujet(s)
Maladies du rein/étiologie , Maladies du rein/métabolisme , Transplantation rénale/effets indésirables , Phénotype , Protéomique , Marqueurs biologiques/métabolisme , Humains , Maladies du rein/anatomopathologie , Transplantation homologue/effets indésirables
15.
Brief Bioinform ; 20(2): 585-597, 2019 03 25.
Article de Anglais | MEDLINE | ID: mdl-29672679

RÉSUMÉ

Disease diagnosis using cell-free DNA (cfDNA) has been an active research field recently. Most existing approaches perform diagnosis based on the detection of sequence variants on cfDNA; thus, their applications are limited to diseases associated with high mutation rate such as cancer. Recent developments start to exploit the epigenetic information on cfDNA, which could have substantially wider applications. In this work, we provide thorough reviews and discussions on the statistical method developments and data analysis strategies for using cfDNA epigenetic profiles, in particular DNA methylation, to construct disease diagnostic models. We focus on two important aspects: marker selection and prediction model construction, under different scenarios. We perform simulations and real data analysis to compare different approaches, and provide recommendations for data analysis.


Sujet(s)
Système acellulaire , Méthylation de l'ADN , Épigénomique , Humains
16.
Biomed Eng Online ; 17(Suppl 2): 152, 2018 Nov 06.
Article de Anglais | MEDLINE | ID: mdl-30396341

RÉSUMÉ

BACKGROUND: Screening test using CA-125 is the most common test for detecting ovarian cancer. However, the level of CA-125 is diverse by variable condition other than ovarian cancer. It has led to misdiagnosis of ovarian cancer. METHODS: In this paper, we explore the 16 serum biomarker for finding alternative biomarker combination to reduce misdiagnosis. For experiment, we use the serum samples that contain 101 cancer and 92 healthy samples. We perform two major tasks: Marker selection and Classification. For optimal marker selection, we use genetic algorithm, random forest, T-test and logistic regression. For classification, we compare linear discriminative analysis, K-nearest neighbor and logistic regression. RESULTS: The final results show that the logistic regression gives high performance for both tasks, and HE4-ELISA, PDGF-AA, Prolactin, TTR is the best biomarker combination for detecting ovarian cancer. CONCLUSIONS: We find the combination which contains TTR and Prolactin gives high performance for cancer detection. Early detection of ovarian cancer can reduce high mortality rates. Finding a combination of multiple biomarkers for diagnostic tests with high sensitivity and specificity is very important.


Sujet(s)
Marqueurs biologiques tumoraux/sang , Tumeurs de l'ovaire/sang , Tumeurs de l'ovaire/diagnostic , Études cas-témoins , Biologie informatique , Femelle , Humains , Apprentissage machine , Dépistage de masse
17.
Anal Chim Acta ; 1029: 50-57, 2018 Oct 31.
Article de Anglais | MEDLINE | ID: mdl-29907290

RÉSUMÉ

Data analysis represents a key challenge for untargeted metabolomics studies and it commonly requires extensive processing of more than thousands of metabolite peaks included in raw high-resolution MS data. Although a number of software packages have been developed to facilitate untargeted data processing, they have not been comprehensively scrutinized in the capability of feature detection, quantification and marker selection using a well-defined benchmark sample set. In this study, we acquired a benchmark dataset from standard mixtures consisting of 1100 compounds with specified concentration ratios including 130 compounds with significant variation of concentrations. Five software evaluated here (MS-Dial, MZmine 2, XCMS, MarkerView, and Compound Discoverer) showed similar performance in detection of true features derived from compounds in the mixtures. However, significant differences between untargeted metabolomics software were observed in relative quantification of true features in the benchmark dataset. MZmine 2 outperformed the other software in terms of quantification accuracy and it reported the most true discriminating markers together with the fewest false markers. Furthermore, we assessed selection of discriminating markers by different software using both the benchmark dataset and a real-case metabolomics dataset to propose combined usage of two software for increasing confidence of biomarker identification. Our findings from comprehensive evaluation of untargeted metabolomics software would help guide future improvements of these widely used bioinformatics tools and enable users to properly interpret their metabolomics results.


Sujet(s)
Métabolomique/méthodes , Logiciel , Référenciation , Marqueurs biologiques/métabolisme , Piper nigrum/métabolisme
18.
Front Plant Sci ; 8: 1182, 2017.
Article de Anglais | MEDLINE | ID: mdl-28729875

RÉSUMÉ

Molecular plant breeding with the aid of molecular markers has played an important role in modern plant breeding over the last two decades. Many marker-based predictions for quantitative traits have been made to enhance parental selection, but the trait prediction accuracy remains generally low, even with the aid of dense, genome-wide SNP markers. To search for more accurate trait-specific prediction with informative SNP markers, we conducted a literature review on the prediction issues in molecular plant breeding and on the applicability of an RNA-Seq technique for developing function-associated specific trait (FAST) SNP markers. To understand whether and how FAST SNP markers could enhance trait prediction, we also performed a theoretical reasoning on the effectiveness of these markers in a trait-specific prediction, and verified the reasoning through computer simulation. To the end, the search yielded an alternative to regular genomic selection with FAST SNP markers that could be explored to achieve more accurate trait-specific prediction. Continuous search for better alternatives is encouraged to enhance marker-based predictions for an individual quantitative trait in molecular plant breeding.

19.
Front Plant Sci ; 8: 986, 2017.
Article de Anglais | MEDLINE | ID: mdl-28638401

RÉSUMÉ

Hybrid rice has contributed significantly to the world food security. Breeding of elite high-yield, strong-resistant broad-spectrum restorer line is an important strategy for hybrid rice in commercial breeding programs. Here, we developed three elite brown planthopper (BPH)-resistant wide-spectrum restorer lines by pyramiding big-panicle gene Gn8.1, BPH-resistant genes Bph6 and Bph9, fertility restorer genes Rf3, Rf4, Rf5, and Rf6 through molecular marker assisted selection. Resistance analysis revealed that the newly developed restorer lines showed stronger BPH-resistance than any of the single-gene donor parent Luoyang-6 and Luoyang-9. Moreover, the three new restorer lines had broad spectrum recovery capabilities for Honglian CMS, Wild abortive CMS and two-line GMS sterile lines, and higher grain yields than that of the recurrent parent 9,311 under nature field conditions. Importantly, the hybrid crosses also showed good performance for grain yield and BPH-resistance. Thus, the development of elite BPH-resistant wide-spectrum restorer lines has a promising future for breeding of broad spectrum BPH-resistant high-yield varieties.

20.
Oncotarget ; 8(24): 38802-38810, 2017 Jun 13.
Article de Anglais | MEDLINE | ID: mdl-28415579

RÉSUMÉ

Bladder cancer is one of the most common urinary tract carcinomas in the world. Urine metabolomics is a promising approach for bladder cancer detection and marker discovery since urine is in direct contact with bladder epithelia cells; metabolites released from bladder cancer cells may be enriched in urine samples. In this study, we applied ultra-performance liquid chromatography time-of-flight mass spectrometry to profile metabolite profiles of 87 samples from bladder cancer patients and 65 samples from hernia patients. An OPLS-DA classification revealed that bladder cancer samples can be discriminated from hernia samples based on the profiles. A marker discovery pipeline selected six putative markers from the metabolomic profiles. An LLE clustering demonstrated the discriminative power of the chosen marker candidates. Two of the six markers were identified as imidazoleacetic acid whose relation to bladder cancer has certain degree of supporting evidence. A machine learning model, decision trees, was built based on the metabolomic profiles and the six marker candidates. The decision tree obtained an accuracy of 76.60%, a sensitivity of 71.88%, and a specificity of 86.67% from an independent test.


Sujet(s)
Marqueurs biologiques tumoraux/analyse , Métabolome , Métabolomique/méthodes , Tumeurs de la vessie urinaire/diagnostic , Sujet âgé , Études cas-témoins , Chromatographie en phase liquide , Femelle , Études de suivi , Humains , Mâle , Spectrométrie de masse , Adulte d'âge moyen , Pronostic , Tumeurs de la vessie urinaire/métabolisme
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE