Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
J Sci Food Agric ; 103(9): 4704-4718, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36924039

RESUMO

BACKGROUND: This study investigated the geographical origin classification of green coffee beans from continental to country and regional levels. An innovative approach combined stable isotope and trace element analyses with non-linear machine learning data analysis to improve coffee origin classification and marker selection. Specialty green coffee beans sourced from three continents, eight countries, and 22 regions were analyzed by measuring five isotope ratios (δ13 C, δ15 N, δ18 O, δ2 H, and δ34 S) and 41 trace elements. Partial least squares discriminant analysis (PLS-DA) was applied to the integrated dataset for origin classification. RESULTS: Origins were predicted well at the country level and showed promise at the regional level, with discriminating marker selection at all levels. However, PLS-DA predicted origin poorly at the continental and Central American regional levels. Non-linear machine learning techniques improved predictions and enabled the identification of a higher number of origin markers, and those that were identified were more relevant. The best predictive accuracy was found using ensemble decision trees, random forest and extreme gradient boost, with accuracies of up to 0.94 and 0.89 for continental and Central American regional models, respectively. CONCLUSION: The potential for advanced machine learning models to improve origin classification and the identification of relevant origin markers was demonstrated. The decision-tree-based models were superior with their embedded variable identification features and visual interpretation. © 2023 The Authors. Journal of The Science of Food and Agriculture published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.


Assuntos
Aprendizado de Máquina , Isótopos/química , Oligoelementos/química , Dinâmica não Linear , Café/química
2.
BMC Bioinformatics ; 23(1): 316, 2022 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-35927623

RESUMO

BACKGROUND: ImputAccur is a software tool to measure genotype-imputation accuracy. Imputation of untyped markers is a standard approach in genome-wide association studies to close the gap between directly genotyped and other known DNA variants. However, high accuracy for imputed genotypes is fundamental. Several accuracy measures have been proposed, but unfortunately, they are implemented on different platforms, which is impractical. RESULTS: With ImputAccur, the accuracy measures info, Iam-hiQ and r2-based indices can be derived from standard output files of imputation software. Sample/probe and marker filtering is possible. This allows e.g. accurate marker filtering ahead of data analysis. CONCLUSIONS: The source code (Python version 3.9.4), a standalone executive file, and example data for ImputAccur are freely available at https://gitlab.gwdg.de/kolja.thormann1/imputationquality.git .


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Genótipo , Software
3.
Mol Biol Evol ; 37(3): 904-922, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31710677

RESUMO

Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.


Assuntos
Biologia Computacional/métodos , Répteis/genética , Sequenciamento Completo do Genoma/métodos , Animais , Teorema de Bayes , Evolução Molecular , Éxons , Loci Gênicos , Filogenia , Répteis/classificação , Alinhamento de Sequência
4.
Brief Bioinform ; 20(2): 585-597, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-29672679

RESUMO

Disease diagnosis using cell-free DNA (cfDNA) has been an active research field recently. Most existing approaches perform diagnosis based on the detection of sequence variants on cfDNA; thus, their applications are limited to diseases associated with high mutation rate such as cancer. Recent developments start to exploit the epigenetic information on cfDNA, which could have substantially wider applications. In this work, we provide thorough reviews and discussions on the statistical method developments and data analysis strategies for using cfDNA epigenetic profiles, in particular DNA methylation, to construct disease diagnostic models. We focus on two important aspects: marker selection and prediction model construction, under different scenarios. We perform simulations and real data analysis to compare different approaches, and provide recommendations for data analysis.


Assuntos
Sistema Livre de Células , Metilação de DNA , Epigenômica , Humanos
5.
J Dairy Sci ; 104(4): 4478-4485, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33612229

RESUMO

Marker sets used in US dairy genomic predictions were previously expanded by including high-density (HD) or sequence markers with the largest effects for Holstein breed only. Other non-Holstein breeds lacked enough HD genotyped animals to be used as a reference population at that time, and thus were not included in the genomic prediction. Recently, numbers of non-Holstein breeds genotyped using HD panels reached an acceptable level for imputation and marker selection, allowing HD genomic prediction and HD marker selection for Holstein plus 4 other breeds. Genotypes for 351,461 Holsteins, 347,570 Jerseys, 42,346 Brown Swiss, 9,364 Ayrshires (including Red dairy cattle), and 4,599 Guernseys were imputed to the HD marker list that included 643,059 SNP. The separate HD reference populations included Illumina BovineHD (San Diego, CA) genotypes for 4,012 Holsteins, 407 Jerseys, 181 Brown Swiss, 527 Ayrshires, and 147 Guernseys. The 643,059 variants included the HD SNP and all 79,254 (80K) genetic markers and QTL used in routine national genomic evaluations. Before imputation, approximately 91 to 97% of genotypes were unknown for each breed; after imputation, 1.1% of Holstein, 3.2% of Jersey, 6.7% of Brown Swiss, 4.8% of Ayrshire, and 4.2% of Guernsey alleles remained unknown due to lower density haplotypes that had no matching HD haplotype. The higher remaining missing rates in non-Holstein breeds are mainly due to fewer HD genotyped animals in the imputation reference populations. Allele effects for up to 39 traits were estimated separately within each breed using phenotypic reference populations that included up to 6,157 Jersey males and 110,130 Jersey females. Correlations of HD with 80K genomic predictions for young animals averaged 0.986, 0.989, 0.985, 0.992, and 0.978 for Jersey, Ayrshire, Brown Swiss, Guernsey, and Holstein breeds, respectively. Correlations were highest for yield traits (about 0.991) and lowest for foot angle and rear legs-side view (0.981and 0.982, respectively). Some HD effects were more than twice as large as the largest 80K SNP effect, and HD markers had larger effects than nearby 80K markers for many breed-trait combinations. Previous studies selected and included markers with large effects for Holstein traits; the newly selected HD markers should also improve non-Holstein and crossbred genomic predictions and were added to official US genomic predictions in April 2020.


Assuntos
Genômica , Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Feminino , Genótipo , Guernsey , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
6.
J Dairy Sci ; 104(6): 6897-6908, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33685702

RESUMO

The addition of cattle health and immunity traits to genomic selection indices holds promise to increase individual animal longevity and productivity, and decrease economic losses from disease. However, highly variable genomic loci that contain multiple immune-related genes were poorly assembled in the first iterations of the cattle reference genome assembly and underrepresented during the development of most commercial genotyping platforms. As a consequence, there is a paucity of genetic markers within these loci that may track haplotypes related to disease susceptibility. By using hierarchical assembly of bacterial artificial chromosome inserts spanning 3 of these immune-related gene regions, we were able to assemble multiple full-length haplotypes of the major histocompatibility complex, the leukocyte receptor complex, and the natural killer cell complex. Using these new assemblies and the recently released ARS-UCD1.2 reference, we aligned whole-genome shotgun reads from 125 sequenced Holstein bulls to discover candidate variants for genetic marker development. We selected 124 SNPs, using heuristic and statistical models to develop a custom genotyping panel. In a proof-of-principle study, we used this custom panel to genotype 1,797 Holstein cows exposed to bovine tuberculosis (bTB) that were the subject of a previous GWAS study using the Illumina BovineHD array. Although we did not identify any significant association of bTB phenotypes with these new genetic markers, 2 markers exhibited substantial effects on bTB phenotypic prediction. The models and parameters trained in this study serve as a guide for future marker discovery surveys particularly in previously unassembled regions of the cattle genome.


Assuntos
Complexo Antígeno-Anticorpo , Genoma , Animais , Bovinos/genética , Feminino , Estudo de Associação Genômica Ampla/veterinária , Genômica , Genótipo , Masculino , Polimorfismo de Nucleotídeo Único/genética
7.
BMC Bioinformatics ; 21(1): 477, 2020 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-33097004

RESUMO

BACKGROUND: High throughput microfluidic protocols in single cell RNA sequencing (scRNA-seq) collect mRNA counts from up to one million individual cells in a single experiment; this enables high resolution studies of rare cell types and cell development pathways. Determining small sets of genetic markers that can identify specific cell populations is thus one of the major objectives of computational analysis of mRNA counts data. Many tools have been developed for marker selection on single cell data; most of them, however, are based on complex statistical models and handle the multi-class case in an ad-hoc manner. RESULTS: We introduce RANKCORR, a fast method with strong mathematical underpinnings that performs multi-class marker selection in an informed manner. RANKCORR proceeds by ranking the mRNA counts data before linearly separating the ranked data using a small number of genes. The step of ranking is intuitively natural for scRNA-seq data and provides a non-parametric method for analyzing count data. In addition, we present several performance measures for evaluating the quality of a set of markers when there is no known ground truth. Using these metrics, we compare the performance of RANKCORR to a variety of other marker selection methods on an assortment of experimental and synthetic data sets that range in size from several thousand to one million cells. CONCLUSIONS: According to the metrics introduced in this work, RANKCORR is consistently one of most optimal marker selection methods on scRNA-seq data. Most methods show similar overall performance, however; thus, the speed of the algorithm is the most important consideration for large data sets (and comparing the markers selected by several methods can be fruitful). RANKCORR is fast enough to easily handle the largest data sets and, as such, it is a useful tool to add into computational pipelines when dealing with high throughput scRNA-seq data. RANKCORR software is available for download at https://github.com/ahsv/RankCorr with extensive documentation.


Assuntos
Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Célula Única , Algoritmos , Animais , Sequência de Bases , Células da Medula Óssea/metabolismo , Análise por Conglomerados , Simulação por Computador , Perfilação da Expressão Gênica , Marcadores Genéticos , Humanos , Camundongos , Curva ROC , Software
8.
Biomed Eng Online ; 17(Suppl 2): 152, 2018 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-30396341

RESUMO

BACKGROUND: Screening test using CA-125 is the most common test for detecting ovarian cancer. However, the level of CA-125 is diverse by variable condition other than ovarian cancer. It has led to misdiagnosis of ovarian cancer. METHODS: In this paper, we explore the 16 serum biomarker for finding alternative biomarker combination to reduce misdiagnosis. For experiment, we use the serum samples that contain 101 cancer and 92 healthy samples. We perform two major tasks: Marker selection and Classification. For optimal marker selection, we use genetic algorithm, random forest, T-test and logistic regression. For classification, we compare linear discriminative analysis, K-nearest neighbor and logistic regression. RESULTS: The final results show that the logistic regression gives high performance for both tasks, and HE4-ELISA, PDGF-AA, Prolactin, TTR is the best biomarker combination for detecting ovarian cancer. CONCLUSIONS: We find the combination which contains TTR and Prolactin gives high performance for cancer detection. Early detection of ovarian cancer can reduce high mortality rates. Finding a combination of multiple biomarkers for diagnostic tests with high sensitivity and specificity is very important.


Assuntos
Biomarcadores Tumorais/sangue , Neoplasias Ovarianas/sangue , Neoplasias Ovarianas/diagnóstico , Estudos de Casos e Controles , Biologia Computacional , Feminino , Humanos , Aprendizado de Máquina , Programas de Rastreamento
9.
Genet Epidemiol ; 38(2): 144-51, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24395534

RESUMO

In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms "classic" meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance.


Assuntos
Neoplasias/genética , Algoritmos , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Simulação por Computador , Feminino , Marcadores Genéticos , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Modelos Genéticos , Neoplasias/diagnóstico , Prognóstico
10.
Genome ; 58(5): 151-62, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-26444714

RESUMO

The 6th International Barcode of Life Conference (Guelph, Canada, 18-21 August 2015), themed Barcodes to Biomes, showcases the latest developments in DNA barcoding research and its diverse applications. The meeting also provides a venue for a global research community to share ideas and to initiate collaborations. All plenary and contributed abstracts are being published as an open-access special issue of Genome. Here, I use a comparison with the 3rd Conference (Mexico City, 2009) to highlight 10 recent and emerging trends that are apparent among the contributed abstracts. One of the outstanding trends is the rising proportion of abstracts that focus upon multiple socio-economically important applications of DNA barcoding, including studies of agricultural pests, quarantine and invasive species, wildlife forensics, disease vectors, biomonitoring of ecosystem health, and marketplace surveys evaluating the authenticity of seafood products and medicinal plants. Other key movements include the use of barcoding and metabarcoding approaches for dietary analyses-and for studies of food webs spanning three or more trophic levels-as well as the spread of next-generation sequencing methods in multiple contexts. In combination with the rising taxonomic and geographic scope of many barcoding iniatives, these developments suggest that several important questions in biology are becoming tractable. "What is this specimen on an agricultural shipment?", "Who eats whom in this whole food web?", and even "How many species are there?" are questions that may be answered in time periods ranging from a few years to one or a few decades. The next phases of DNA barcoding may expand yet further into prediction of community shifts with climate change and improved management of biological resources.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Animais , Biodiversidade , Monitorização de Parâmetros Ecológicos , Humanos
11.
Yeast ; 31(1): 29-46, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24307396

RESUMO

The cloning of DNA fragments into vectors or host genomes has traditionally been performed using Escherichia coli with restriction enzymes and DNA ligase or homologous recombination-based reactions. We report here a novel DNA cloning method that does not require DNA end processing or homologous recombination, but that ensures highly accurate cloning. The method exploits the efficient non-homologous end-joining (NHEJ) activity of the yeast Kluyveromyces marxianus and consists of a novel functional marker selection system. First, to demonstrate the applicability of NHEJ to DNA cloning, a C-terminal-truncated non-functional ura3 selection marker and the truncated region were PCR-amplified separately, mixed and directly used for the transformation. URA3(+) transformants appeared on the selection plates, indicating that the two DNA fragments were correctly joined by NHEJ to generate a functional URA3 gene that had inserted into the yeast chromosome. To develop the cloning system, the shortest URA3 C-terminal encoding sequence that could restore the function of a truncated non-functional ura3 was determined by deletion analysis, and was included in the primers to amplify target DNAs for cloning. Transformation with PCR-amplified target DNAs and C-terminal truncated ura3 produced numerous transformant colonies, in which a functional URA3 gene was generated and was integrated into the chromosome with the target DNAs. Several K. marxianus circular plasmids with different selection markers were also developed for NHEJ-based cloning and recombinant DNA construction. The one-step DNA cloning method developed here is a relatively simple and reliable procedure among the DNA cloning systems developed to date.


Assuntos
Clonagem Molecular/métodos , Kluyveromyces/genética , Recombinação Genética , Seleção Genética , Transformação Genética , Plasmídeos , Proteínas de Saccharomyces cerevisiae/genética
12.
Stat Med ; 33(28): 4988-98, 2014 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-25146388

RESUMO

Consider the integrative analysis of genetic data with multiple correlated response variables. The goal is to identify important gene-environment (G × E) interactions along with main gene and environment effects that are associated with the responses. The homogeneity and heterogeneity models can be adopted to describe the genetic basis of multiple responses. To accommodate possible nonlinear effects of some environment effects, a multi-response partially linear varying coefficient model is assumed. Penalization is adopted for marker selection. The proposed penalization method can select genetic variants with G × E interactions, no G × E interactions, and no main effects simultaneously. It adopts different penalties to accommodate the homogeneity and heterogeneity models. The proposed method can be effectively computed using a coordinate descent algorithm. Simulation study and the analysis of Health Professionals Follow-up Study, which has two correlated continuous traits, SNP measurements and multiple environment effects, show superior performance of the proposed method over its competitors.


Assuntos
Interação Gene-Ambiente , Variação Genética/fisiologia , Modelos Genéticos , Modelos Estatísticos , Algoritmos , Simulação por Computador , Seguimentos , Predisposição Genética para Doença , Variação Genética/genética , Pessoal de Saúde , Humanos , Obesidade/genética , Polimorfismo de Nucleotídeo Único/genética
13.
Sci Rep ; 14(1): 16766, 2024 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-39034310

RESUMO

The tumor microenvironment (TME) plays a pivotal role in the onset, progression, and treatment response of cancer. Among the various components of the TME, cancer-associated fibroblasts (CAFs) are key regulators of both immune and non-immune cellular functions. Leveraging single-cell RNA sequencing (scRNA) data, we have uncovered previously hidden and promising roles within this specific CAF subgroup, paving the way for its clinical application. However, several critical questions persist, primarily stemming from the heterogeneous nature of CAFs and the use of different fibroblast markers in various sample analyses, causing confusion and hindrance in their clinical implementation. In this groundbreaking study, we have systematically screened multiple databases to identify the most robust marker for distinguishing CAFs in lung cancer, with a particular focus on their potential use in early diagnosis, staging, and treatment response evaluation. Our investigation revealed that COL1A1, COL1A2, FAP, and PDGFRA are effective markers for characterizing CAF subgroups in most lung adenocarcinoma datasets. Through comprehensive analysis of treatment responses, we determined that COL1A1 stands out as the most effective indicator among all CAF markers. COL1A1 not only deciphers the TME signatures related to CAFs but also demonstrates a highly sensitive and specific correlation with treatment responses and multiple survival outcomes. For the first time, we have unveiled the distinct roles played by clusters of CAF markers in differentiating various TME groups. Our findings confirm the sensitive and unique contributions of CAFs to the responses of multiple lung cancer therapies. These insights significantly enhance our understanding of TME functions and drive the translational application of extensive scRNA sequence results. COL1A1 emerges as the most sensitive and specific marker for defining CAF subgroups in scRNA analysis. The CAF ratios represented by COL1A1 can potentially serve as a reliable predictor of treatment responses in clinical practice, thus providing valuable insights into the influential roles of TME components. This research marks a crucial step forward in revolutionizing our approach to cancer diagnosis and treatment.


Assuntos
Biomarcadores Tumorais , Fibroblastos Associados a Câncer , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Microambiente Tumoral , Humanos , Neoplasias Pulmonares/mortalidade , Neoplasias Pulmonares/patologia , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/terapia , Carcinoma Pulmonar de Células não Pequenas/mortalidade , Carcinoma Pulmonar de Células não Pequenas/patologia , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Fibroblastos Associados a Câncer/metabolismo , Fibroblastos Associados a Câncer/patologia , Biomarcadores Tumorais/metabolismo , Prognóstico , Regulação Neoplásica da Expressão Gênica
14.
Stat Med ; 32(20): 3509-21, 2013 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-23519988

RESUMO

In the analysis of cancer studies with high-dimensional genomic measurements, integrative analysis provides an effective way of pooling information across multiple heterogeneous datasets. The genomic basis of multiple independent datasets, which can be characterized by the sets of genomic markers, can be described using the homogeneity model or heterogeneity model. Under the homogeneity model, all datasets share the same set of markers associated with responses. In contrast, under the heterogeneity model, different studies have overlapping but possibly different sets of markers. The heterogeneity model contains the homogeneity model as a special case and can be much more flexible. Marker selection under the heterogeneity model calls for bi-level selection to determine whether a covariate is associated with response in any study at all as well as in which studies it is associated with responses. In this study, we consider two minimax concave penalty-based penalization approaches for marker selection under the heterogeneity model. For each approach, we describe its rationale and an effective computational algorithm. We conduct simulations to investigate their performance and compare with the existing alternatives. We also apply the proposed approaches to the analysis of gene expression data on multiple cancers.


Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Interpretação Estatística de Dados , Genômica/métodos , Modelos Estatísticos , Neoplasias/genética , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
15.
Biology (Basel) ; 12(10)2023 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-37886990

RESUMO

Microsatellites are polymorphic and cost-effective. Optimizing reduced microsatellite panels using heuristic algorithms eases budget constraints in genetic diversity and population genetic assessments. Microsatellite marker efficiency is strongly associated with its polymorphism and is quantified as the polymorphic information content (PIC). Nevertheless, marker selection cannot rely solely on PIC. In this study, the ant colony optimization (ACO) algorithm, a widely recognized optimization method, was adopted to create an enhanced selection scheme for refining microsatellite marker panels, called the PIC-ACO selection scheme. The algorithm was fine-tuned and validated using extensive datasets of chicken (Gallus gallus) and Chinese gorals (Naemorhedus griseus) from our previous studies. In contrast to basic optimization algorithms that stochastically initialize potential outputs, our selection algorithm utilizes the PIC values of markers to prime the ACO process. This increases the global solution discovery speed while reducing the likelihood of becoming trapped in local solutions. This process facilitated the acquisition of a cost-efficient and optimized microsatellite marker panel for studying genetic diversity and population genetic datasets. The established microsatellite efficiency metrics such as PIC, allele richness, and heterozygosity were correlated with the actual effectiveness of the microsatellite marker panel. This approach could substantially reduce budgetary barriers to population genetic assessments, breeding, and conservation programs.

16.
Genes (Basel) ; 13(11)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36421838

RESUMO

An assessment of the genetic diversity and structure of a population is essential for designing recovery plans for threatened species. Italy hosts two brown bear populations, Ursus arctos marsicanus (Uam), endemic to the Apennines of central Italy, and Ursus arctos arctos (Uaa), in the Italian Alps. Both populations are endangered and occasionally involved in human-wildlife conflict; thus, detailed management plans have been in place for several decades, including genetic monitoring. Here, we propose a simple cost-effective microsatellite-based protocol for the management of populations with low genetic variation. We sampled 22 Uam and 22 Uaa individuals and analyzed a total of 32 microsatellite loci in order to evaluate their applicability in individual identification. Based on genetic variability estimates, we compared data from four different STR marker sets, to evaluate the optimal settings in long-term monitoring projects. Allelic richness and gene diversity were the highest for the Uaa population, whereas depleted genetic variability was noted for the Uam population, which should be regarded as a conservation priority. Our results identified the most effective STR sets for the estimation of genetic diversity and individual discrimination in Uam (9 loci, PIC 0.45; PID 2.0 × 10-5), and Uaa (12 loci, PIC 0.64; PID 6.9 × 10-11) populations, which can easily be utilized by smaller laboratories to support local governments in regular population monitoring. The method we proposed to select the most variable markers could be adopted for the genetic characterization of other small and isolated populations.


Assuntos
Ursidae , Animais , Alelos , Itália , Repetições de Microssatélites/genética , Ursidae/genética
17.
Anim Cells Syst (Seoul) ; 24(6): 321-328, 2020 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-33456716

RESUMO

Despite the various existing studies about nonsynonymous single nucleotide polymorphisms (nsSNPs), genome-wide studies based on nsSNPs are rare. NsSNPs alter amino acid sequences, affect protein structure and function, and have deleterious effects. By predicting the deleterious effect of nsSNPs, we determined the total risk score per individual. Additionally, the machine learning technique was utilized to find an optimal nsSNP subset that best explains the complete nsSNP effect. A total of 16,100 nsSNPs were selected as the best representatives among 89,519 regressed nsSNPs. In the gene ontology analysis encompassing the 16,100 nsSNPs, DNA metabolic process, chemokine- and immune-related, and reproduction were the most enriched terms. We expect that our risk score prediction and nsSNP marker selection will contribute to future development of extant genome-wide association studies and breeding science more broadly.

18.
Bayesian Anal ; 15(1): 79-102, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32802246

RESUMO

Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA).

19.
Comput Struct Biotechnol J ; 18: 2012-2025, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32802273

RESUMO

Cancer proteomics has become a powerful technique for characterizing the protein markers driving transformation of malignancy, tracing proteome variation triggered by therapeutics, and discovering the novel targets and drugs for the treatment of oncologic diseases. To facilitate cancer diagnosis/prognosis and accelerate drug target discovery, a variety of methods for tumor marker identification and sample classification have been developed and successfully applied to cancer proteomic studies. This review article describes the most recent advances in those various approaches together with their current applications in cancer-related studies. Firstly, a number of popular feature selection methods are overviewed with objective evaluation on their advantages and disadvantages. Secondly, these methods are grouped into three major classes based on their underlying algorithms. Finally, a variety of sample separation algorithms are discussed. This review provides a comprehensive overview of the advances on tumor maker identification and patients/samples/tissues separations, which could be guidance to the researches in cancer proteomics.

20.
Proteomics Clin Appl ; 13(2): e1800091, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30680934

RESUMO

There is a need for accurate, robust, non-invasive methods to provide early diagnosis of graft lesions after kidney transplantation. A multitude of proteomic biomarkers for the major kidney allograft disease phenotypes defined by the BANFF classification criteria have been described in literature. None of these biomarkers have been established in the clinic. A key reason for this is the lack of clinical validation which is difficult, as even the gold standard of diagnosis, kidney biopsy, is often ambiguous. The semantic clustering by ReviGO on top of transcriptomic pathway analysis is evaluated to connect histological and transcriptomic kidney allograft disease characteristics with proteomic biomarker qualification. By using public data generated in microarray studies of kidney allograft tissue, biological processes and key molecules specifically associated with the different kidney allograft disease phenotypes are identified. Semantic clustering holds the promise to guide adaptation of proteomic marker panels to molecular pathology. This can support the development of noninvasive tests (e.g. in urine, by capillary electrophoresis mass spectrometry) that simultaneously detect diverse kidney allograft phenotypes with high accuracy and sensitivity.


Assuntos
Nefropatias/etiologia , Nefropatias/metabolismo , Transplante de Rim/efeitos adversos , Fenótipo , Proteômica , Biomarcadores/metabolismo , Humanos , Nefropatias/patologia , Transplante Homólogo/efeitos adversos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa