RESUMO
Introduction: Mortality is a major primary endpoint for long-term hemodialysis (HD) patients. The clinical status of HD patients generally relies on longitudinal clinical observations such as monthly laboratory examinations and physical examinations. Methods: A total of 829 HD patients who met the inclusion criteria were analyzed. All patients were tracked from January 2009 to December 2013. Taken together, this study performed full-adjusted-Cox proportional hazards (CoxPH), stepwise-CoxPH, random survival forest (RSF)-CoxPH, and whale optimization algorithm (WOA)-CoxPH model for the all-cause mortality risk assessment in HD patients. The model performance between proposed selections of CoxPH models were evaluated using concordance index. Results: The WOA-CoxPH model obtained the highest concordance index compared with RSF-CoxPH and typical selection CoxPH model. The eight significant parameters obtained from the WOA-CoxPH model, including age, diabetes mellitus (DM), hemoglobin (Hb), albumin, creatinine (Cr), potassium (K), Kt/V, and cardiothoracic ratio, have also showed significant survival difference between low- and high-risk characteristics in single-factor analysis. By integrating the risk characteristics of each single factor, patients who obtained seven or more risk characteristics of eight selected parameters were dichotomized as high-risk subgroup, and remaining is considered as low-risk subgroup. The integrated low- and high-risk subgroup showed greater discrepancy compared with each single risk factor selected by WOA-CoxPH model. Conclusion: The study findings revealed WOA-CoxPH model could provide better risk assessment performance compared with RSF-CoxPH and typical selection CoxPH model in the HD patients. In summary, patients who had seven or more risk characteristics of eight selected parameters were at potentially increased risk of all-cause mortality in HD population.
RESUMO
The mitochondrial gene cytochrome c oxidase I (COI) is commonly used for DNA barcoding in animals. However, most of the COI barcode nucleotides are conserved and sequences longer than about 650 base pairs increase the computational burden for species identification. To solve this problem, we propose a decision theory-based COI SNP tagging (DCST) approach that focuses on the discrimination of species using single nucleotide polymorphisms (SNPs) as the variable nucleotides of the sequences of a group of species. Using the example of 126 teleost mackerel fish species (order: Scombriformes), we identified 281 SNPs by alignment and trimming of their COI sequences. After decision rule making, 49 SNPs in 126 fish species were determined using the scoring system of the DCST approach. These COI-SNP barcodes were finally transformed into one-dimensional barcode images. Our proposed DCST approach simplifies the computational complexity and identifies the most effective and fewest SNPs to resolve or discriminate species for species tagging.
RESUMO
Gene-gene interactions (GGIs) are important markers for determining susceptibility to a disease. Multifactor dimensionality reduction (MDR) is a popular algorithm for detecting GGIs and primarily adopts the correct classification rate (CCR) to assess the quality of a GGI. However, CCR measurement alone may not successfully detect certain GGIs because of potential model preferences and disease complexities. In this study, multiple-criteria decision analysis (MCDA) based on MDR was named MCDA-MDR and proposed for detecting GGIs. MCDA facilitates MDR to simultaneously adopt multiple measures within the two-way contingency table of MDR to assess GGIs; the CCR and rule utility measure were employed. Cross-validation consistency was adopted to determine the most favorable GGIs among the Pareto sets. Simulation studies were conducted to compare the detection success rates of the MDR-only-based measure and MCDA-MDR, revealing that MCDA-MDR had superior detection success rates. The Wellcome Trust Case Control Consortium dataset was analyzed using MCDA-MDR to detect GGIs associated with coronary artery disease, and MCDA-MDR successfully detected numerous significant GGIs (p < 0.001). MCDA-MDR performance assessment revealed that the applied MCDA successfully enhanced the GGI detection success rate of the MDR-based method compared with MDR alone.
Assuntos
Algoritmos , Biologia Computacional/métodos , Epistasia Genética/genética , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Simulação por Computador , HumanosRESUMO
DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a ribulose diphosphate carboxylase (rbcL) SNP barcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree-selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
RESUMO
DNA barcodes are widely used in taxonomy, systematics, species identification, food safety, and forensic science. Most of the conventional DNA barcode sequences contain the whole information of a given barcoding gene. Most of the sequence information does not vary and is uninformative for a given group of taxa within a monophylum. We suggest here a method that reduces the amount of noninformative nucleotides in a given barcoding sequence of a major taxon, like the prokaryotes, or eukaryotic animals, plants, or fungi. The actual differences in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, provide a tool for developing a rapid, reliable, and high-throughput assay for the discrimination between known species. Here, we investigated SNPs as robust markers of genetic variation for identifying different pigeon species based on available cytochrome c oxidase I (COI) data. We propose here a decision tree-based SNP barcoding (DTSB) algorithm where SNP patterns are selected from the DNA barcoding sequence of several evolutionarily related species in order to identify a single species with pigeons as an example. This approach can make use of any established barcoding system. We here firstly used as an example the mitochondrial gene COI information of 17 pigeon species (Columbidae, Aves) using DTSB after sequence trimming and alignment. SNPs were chosen which followed the rule of decision tree and species-specific SNP barcodes. The shortest barcode of about 11 bp was then generated for discriminating 17 pigeon species using the DTSB method. This method provides a sequence alignment and tree decision approach to parsimoniously assign a unique and shortest SNP barcode for any known species of a chosen monophyletic taxon where a barcoding sequence is available.
RESUMO
Many kinds of primer design (PD) software tools have been developed, but most of them lack a single nucleotide polymorphism (SNP) genotyping service. Here, we introduce the web-based freeware "Prim-SNPing," which, in addition to general PD, provides three kinds of primer design functions for cost-effective SNP genotyping: natural PD, mutagenic PD, and confronting two-pair primers (CTPP) PD. The natural PD and mutagenic PD provide primers and restriction enzyme mining for polymerase chain reaction-restriction fragment of length polymorphism (PCR-RFLP), while CTPP PD provides primers for restriction enzyme-free SNP genotyping. The PCR specificity and efficiency of the designed primers are improved by BLAST searching and evaluating secondary structure (such as GC clamps, dimers, and hairpins), respectively. The length pattern of PCR-RFLP using natural PD is user-adjustable, and the restriction sites of the RFLP enzymes provided by Prim-SNPing are confirmed to be absent within the generated PCR product. In CTPP PD, the need for a separate digestion step in RFLP is eliminated, thus making it faster and cheaper. The output of Prim-SNPing includes the primer list, melting temperature (Tm) value, GC percentage, and amplicon size with enzyme digestion information. The reference SNP (refSNP, or rs) clusters from the Single Nucleotide Polymorphism database (dbSNP) at the National Center for Biotechnology Information (NCBI), and multiple other formats of human, mouse, and rat SNP sequences are acceptable input. In summary, Prim-SNPing provides interactive, user-friendly and cost-effective primer design for SNP genotyping. It is freely available at http://bio.kuas.edu.tw/prim-snping.