Búsqueda | Portal de Búsqueda de la BVS Enfermería

1.

The Association between Oxaliplatin-Induced Peripheral Neuropathy, Sociodemographic, and Clinical Characteristics with Falls in Older Adults with Colorectal Cancer.

Hines, Robert B; Schoborg, Christopher; Sumner, Timothy; Thiesfeldt, Dana-Leigh; Zhang, Shunpu.

Am J Epidemiol ; 2024 May 16.

Artículo en Inglés | MEDLINE | ID: mdl-38751324

RESUMEN

Our purpose was to investigate the associations between oxaliplatin-induced peripheral neuropathy (OIPN), sociodemographic, and clinical characteristics of older colorectal cancer patients with falls. The study population consisted of older adults diagnosed with colorectal cancer obtained from the Surveillance, Epidemiology, and End Results database combined with Medicare claims. OIPN was defined using specific (OIPN 1) and broader (OIPN 2) definitions of OIPN, based on diagnosis codes. Extensions of the Cox regression model to accommodate repeated events were used to obtain overall hazard ratios (HR) with 95% confidence intervals and the cumulative hazard of fall. The unadjusted risk of fall for colorectal cancer survivors with vs. without OIPN 1 at 36 months of follow-up was 19.6% vs. 14.3%, respectively. The association of OIPN with time to fall was moderate (OIPN 1, HR = 1.37, 95% CI: 1.04, 1.79) to small (OIPN 2, HR = 1.24, 95% CI: 1.01, 1.53). Memantine, opioids, cannabinoids, prior history of fall, female sex, advanced age and disease stage, chronic liver disease, diabetes, and chronic obstructive pulmonary disease all increased the hazard rate of fall. Incorporating fall prevention in cancer care is essential to minimize morbidity and mortality of this serious event in older colorectal cancer survivors.

2.

The Association of Supplemental Nutrition Assistance Program Participation and Food Insufficiency among Households with Children in the United States during COVID-19.

Li, Yingru; Zhang, Shunpu; Wang, Liqiang; Lu, Guoqing; Pfeiffer, Ruth; Zou, Zihang.

J Nutr ; 153(10): 3110-3121, 2023 10.

Artículo en Inglés | MEDLINE | ID: mdl-37604384

RESUMEN

BACKGROUND: As the expansion of Supplemental Nutrition Assistance Program (SNAP) benefits and pandemic emergency assistance programs ended in late 2021, little is known about subsequent trends in food insufficiency (FI) among households with children. OBJECTIVES: This research examined the association between SNAP participation and FI among households with children in the United States, particularly non-Hispanic Black (Black) and Hispanic households. METHODS: This cross-sectional analysis used Household Pulse Survey data collected from December 2021 to May 2022. Spatial analysis was conducted to visualize FI and SNAP participation rates across 50 states. With state SNAP policy rules as exogenous instruments and sociodemographic factors as control variables, 2-stage probit models were utilized to assess the SNAP and FI association among all (n = 135,074), Black (n = 13,940), and Hispanic households with children (n = 17,869). RESULTS: Approximately 13.9% [95% confidence interval (CI): 13.85%, 13.99%] of households experienced FI, and 20.4% (CI: 20.35%, 20.51%) received SNAP benefits. Among Black and Hispanic households, higher rates were observed, with 23.3% (CI: 23.12%, 23.4%) and 20.8% (CI: 20.61%, 20.95%) experiencing FI and 36.3% (CI: 36.1%, 36.5%) and 26.9% (CI: 26.61%, 27.13%) receiving SNAP benefits. These rates varied across states, ranging from 8% (Utah) to 21.1% (Mississippi) for FI and from 8.8% (Utah) to 32.7% (New Mexico) for SNAP participation. SNAP participants demonstrated a 12% lower likelihood of FI than nonparticipants (CI: -0.18, -0.05, P < 0.001). Among Black households, SNAP participants had a 29% lower likelihood of FI than nonparticipants (CI: -0.54, -0.03, P < 0.001). However, SNAP participation was not significant among Hispanic households (P = 0.99), nor did it narrow the FI gap between Hispanic and non-Hispanic households (P = 0.22). CONCLUSIONS: SNAP participation was associated with lower levels of FI among households with children, particularly for Black households. However, there was no significant association between SNAP participation and FI among Hispanic households with children.

Asunto(s)

COVID-19 , Asistencia Alimentaria , Humanos , Estados Unidos/epidemiología , Niño , Estudios Transversales , Pobreza , COVID-19/epidemiología , Mississippi , Abastecimiento de Alimentos

3.

The association between sociodemographic, clinical, and potentially preventive therapies with oxaliplatin-induced peripheral neuropathy in colorectal cancer patients.

Hines, Robert B; Schoborg, Christopher; Sumner, Timothy; Zhu, Xiang; Elgin, Elizabeth A; Zhang, Shunpu.

Support Care Cancer ; 31(7): 386, 2023 Jun 09.

Artículo en Inglés | MEDLINE | ID: mdl-37294347

RESUMEN

PURPOSE: The purpose of this retrospective cohort study was to evaluate whether several potentially preventive therapies reduced the rate of oxaliplatin-induced peripheral neuropathy (OIPN) in colorectal cancer patients and to assess the relationship of sociodemographic/clinical factors with OIPN diagnosis. METHODS: Data were obtained from the Surveillance, Epidemiology, and End Results database combined with Medicare claims. Eligible patients were diagnosed with colorectal cancer between 2007 and 2015, ≥ 66 years of age, and treated with oxaliplatin. Two definitions were used to denote diagnosis of OIPN based on diagnosis codes: OIPN 1 (specific definition, drug-induced polyneuropathy) and OIPN 2 (broader definition, additional codes for peripheral neuropathy). Cox regression was used to obtain hazard ratios (HR) with 95% confidence intervals (CI) for the relative rate of OIPN within 2 years of oxaliplatin initiation. RESULTS: There were 4792 subjects available for analysis. At 2 years, the unadjusted cumulative incidence of OIPN 1 was 13.1% and 27.1% for OIPN 2. For both outcomes, no therapies reduced the rate of OIPN diagnosis. The anticonvulsants gabapentin and oxcarbazepine/carbamazepine were associated with an increased rate of OIPN (both definitions) as were increasing cycles of oxaliplatin. Compared to younger patients, those 75-84 years of age experienced a 15% decreased rate of OIPN. For OIPN 2, prior peripheral neuropathy and moderate/severe liver disease were also associated with an increased hazard rate. For OIPN 1, state buy-in health insurance coverage was associated with a decreased hazard rate. CONCLUSION: Additional studies are needed to identify preventive therapeutics for OIPN in cancer patients treated with oxaliplatin.

Asunto(s)

Antineoplásicos , Neoplasias Colorrectales , Enfermedades del Sistema Nervioso Periférico , Estados Unidos , Humanos , Anciano , Oxaliplatino/efectos adversos , Antineoplásicos/efectos adversos , Estudios Retrospectivos , Compuestos Organoplatinos/efectos adversos , Medicare , Enfermedades del Sistema Nervioso Periférico/inducido químicamente , Enfermedades del Sistema Nervioso Periférico/epidemiología , Enfermedades del Sistema Nervioso Periférico/prevención & control , Neoplasias Colorrectales/tratamiento farmacológico

4.

Identifying host-specific amino acid signatures for influenza A viruses using an adjusted entropy measure.

Zhang, Yixiang; Eskridge, Kent M; Zhang, Shunpu; Lu, Guoqing.

BMC Bioinformatics ; 23(1): 333, 2022 Aug 12.

Artículo en Inglés | MEDLINE | ID: mdl-35962315

RESUMEN

BACKGROUND: Influenza A viruses (IAV) exhibit vast genetic mutability and have great zoonotic potential to infect avian and mammalian hosts and are known to be responsible for a number of pandemics. A key computational issue in influenza prevention and control is the identification of molecular signatures with cross-species transmission potential. We propose an adjusted entropy-based host-specific signature identification method that uses a similarity coefficient to incorporate the amino acid substitution information and improve the identification performance. Mutations in the polymerase genes (e.g., PB2) are known to play a major role in avian influenza virus adaptation to mammalian hosts. We thus focus on the analysis of PB2 protein sequences and identify host specific PB2 amino acid signatures. RESULTS: Validation with a set of H5N1 PB2 sequences from 1996 to 2006 results in adjusted entropy having a 40% false negative discovery rate compared to a 60% false negative rate using unadjusted entropy. Simulations across different levels of sequence divergence show a false negative rate of no higher than 10% while unadjusted entropy ranged from 9 to 100%. In addition, under all levels of divergence adjusted entropy never had a false positive rate higher than 9%. Adjusted entropy also identifies important mutations in H1N1pdm PB2 previously identified in the literature that explain changes in divergence between 2008 and 2009 which unadjusted entropy could not identify. CONCLUSIONS: Based on these results, adjusted entropy provides a reliable and widely applicable host signature identification approach useful for IAV monitoring and vaccine development.

Asunto(s)

Subtipo H5N1 del Virus de la Influenza A , Virus de la Influenza A , Gripe Humana , Sustitución de Aminoácidos , Aminoácidos/genética , Animales , Humanos , Subtipo H5N1 del Virus de la Influenza A/genética , Subtipo H5N1 del Virus de la Influenza A/metabolismo , Virus de la Influenza A/genética , Virus de la Influenza A/metabolismo , Gripe Humana/genética , Mamíferos/genética , Proteínas Virales/genética , Proteínas Virales/metabolismo

5.

Characteristics Associated With Nonreceipt of Surveillance Testing and the Relationship With Survival in Stage II and III Colon Cancer.

Hines, Robert B; Jiban, Md Jibanul Haque; Lee, Eunkyung; Odahowski, Cassie L; Wallace, Audrey S; Adams, Spencer J E; Rahman, Saleh M M; Zhang, Shunpu.

Am J Epidemiol ; 190(2): 239-250, 2021 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-32902633

RESUMEN

We investigated characteristics of patients with colon cancer that predicted nonreceipt of posttreatment surveillance testing and the subsequent associations between surveillance status and survival outcomes. This was a retrospective cohort study of the Surveillance, Epidemiology, and End Results database combined with Medicare claims. Patients diagnosed between 2002 and 2009 with disease stages II and III and who were between 66 and 84 years of age were eligible. A minimum of 3 years' follow-up was required, and patients were categorized as having received any surveillance testing (any testing) versus none (no testing). Poisson regression was used to obtain risk ratios with 95% confidence intervals for the relative likelihood of No Testing. Cox models were used to obtain subdistribution hazard ratios with 95% confidence intervals for 5- and 10-year cancer-specific and noncancer deaths. There were 16,009 colon cancer cases analyzed. Patient characteristics that predicted No Testing included older age, Black race, stage III disease, and chemotherapy. Patients in the No Testing group had an increased rate of 10-year cancer death that was greater for patients with stage III disease (subdistribution hazard ratio = 1.79, 95% confidence interval: 1.48, 2.17) than those with stage II disease (subdistribution hazard ratio = 1.41, 95% confidence interval: 1.19, 1.66). Greater efforts are needed to ensure all patients receive the highest quality medical care after diagnosis of colon cancer.

Asunto(s)

Neoplasias del Colon/patología , Neoplasias del Colon/terapia , Factores de Edad , Anciano , Anciano de 80 o más Años , Quimioterapia Adyuvante , Neoplasias del Colon/mortalidad , Comoras , Femenino , Humanos , Masculino , Medicare/estadística & datos numéricos , Persona de Mediana Edad , Estadificación de Neoplasias , Oportunidad Relativa , Pronóstico , Modelos de Riesgos Proporcionales , Calidad de la Atención de Salud , Grupos Raciales , Estudios Retrospectivos , Programa de VERF/estadística & datos numéricos , Factores Socioeconómicos , Estados Unidos

6.

Surveillance Colonoscopy in Older Stage I Colon Cancer Patients and the Association With Colon Cancer-Specific Mortality.

Hines, Robert B; Jiban, Md Jibanul Haque; Specogna, Adrian V; Vishnubhotla, Priya; Lee, Eunkyung; Troy, Steven P; Zhang, Shunpu.

Am J Gastroenterol ; 115(6): 924-933, 2020 06.

Artículo en Inglés | MEDLINE | ID: mdl-32142485

RESUMEN

OBJECTIVES: Guideline-issuing groups differ regarding the recommendation that patients with stage I colon cancer receive surveillance colonoscopy after cancer-directed surgery. This observational comparative effectiveness study was conducted to evaluate the association between surveillance colonoscopy and colon cancer-specific mortality in early stage patients. METHODS: This was a retrospective cohort study of the Surveillance, Epidemiology, and End Results database combined with Medicare claims. Surveillance colonoscopy was assessed as a time-varying exposure up to 5 years after cancer-directed surgery with the following groups: no colonoscopy, one colonoscopy, and ≥ 2 colonoscopies. Inverse probability of treatment weighting was used to balance covariates. The time-dependent Cox regression model was used to obtain inverse probability of treatment weighting-adjusted hazard ratios (HRs), with 95% confidence intervals (CIs) for 5- and 10-year colon cancer, other cancer, and noncancer causes of death. RESULTS: There were 8,783 colon cancer cases available for analysis. Overall, compared with patients who received one colonoscopy, the no colonoscopy group experienced an increased rate of 10-year colon cancer-specific mortality (HR = 1.63; 95% CI 1.31-2.04) and noncancer death (HR = 1.36; 95% CI 1.25-1.49). Receipt of ≥ 2 colonoscopies was associated with a decreased rate of 10-year colon cancer-specific death (HR = 0.60; 95% CI 0.45-0.79), other cancer death (HR = 0.68; 95% CI 0.53-0.88), and noncancer death (HR = 0.69; 95% CI 0.62-0.76). Five-year cause-specific HRs were similar to 10-year estimates. DISCUSSION: These results support efforts to ensure that stage I patients undergo surveillance colonoscopy after cancer-directed surgery to facilitate early detection of new and recurrent neoplastic lesions.

Asunto(s)

Carcinoma/cirugía , Neoplasias del Colon/cirugía , Recurrencia Local de Neoplasia/diagnóstico , Factores de Edad , Anciano , Anciano de 80 o más Años , Carcinoma/mortalidad , Carcinoma/patología , Causas de Muerte , Neoplasias del Colon/mortalidad , Neoplasias del Colon/patología , Investigación sobre la Eficacia Comparativa , Manejo de la Enfermedad , Femenino , Humanos , Almacenamiento y Recuperación de la Información , Masculino , Medicare , Clasificación del Tumor , Estadificación de Neoplasias , Modelos de Riesgos Proporcionales , Programa de VERF , Estados Unidos

7.

The association between post-treatment surveillance testing and survival in stage II and III colon cancer patients: An observational comparative effectiveness study.

Hines, Robert B; Jiban, Md Jibanul Haque; Specogna, Adrian V; Vishnubhotla, Priya; Lee, Eunkyung; Zhang, Shunpu.

BMC Cancer ; 19(1): 418, 2019 May 03.

Artículo en Inglés | MEDLINE | ID: mdl-31053096

RESUMEN

BACKGROUND: The best strategy for surveillance testing in stage II and III colon cancer patients following curative treatment is unknown. Previous randomized controlled trials have suffered from design limitations and yielded conflicting evidence. This observational comparative effectiveness research study was conducted to provide new evidence on the relationship between post-treatment surveillance testing and survival by overcoming the limitations of previous clinical trials. METHODS: This was a retrospective cohort study of the Surveillance, Epidemiology, and End Results database combined with Medicare claims (SEER-Medicare). Stage II and III colon cancer patients diagnosed from 2002 to 2009 and between 66 to 84 years of age were eligible. Adherence to surveillance testing guidelines-including carcinoembryonic antigen, computed tomography, and colonoscopy-was assessed for each year of follow-up and overall for up to three years post-treatment. Patients were categorized as More Adherent and Less Adherent according to testing guidelines. Patients who received no surveillance testing were excluded. The primary outcome was 5-year cancer-specific survival; 5-year overall survival was the secondary outcome. Inverse probability of treatment weighting (IPTW) using generalized boosted models was employed to balance covariates between the two surveillance groups. IPTW-adjusted survival curves comparing the two groups were performed by the Kaplan-Meier method. Weighted Cox regression was used to obtain hazard ratios (HRs) with 95% confidence intervals (CIs) for the relative risk of death for the Less Adherent group versus the More Adherent group. RESULTS: There were 17,860 stage II and III colon cancer cases available for analysis. Compared to More Adherent patients, Less Adherent patients experienced slightly better 5-year cancer-specific survival (HR = 0.83, 95% CI 0.76-0.90) and worse 5-year noncancer-specific survival (HR = 1.61, 95% CI 1.43-1.82) for years 2 to 5 of follow-up. There was no difference between the groups in overall survival (HR = 1.04, 95% CI 0.98-1.10). CONCLUSIONS: More surveillance testing did not improve 5-year cancer-specific survival compared to less testing and there was no difference between the groups in overall survival. The results of this study support a risk-stratified, shared decision-making surveillance strategy to optimize clinical and patient-centered outcomes for colon cancer patients in the survivorship phase of care.

Asunto(s)

Neoplasias del Colon/patología , Neoplasias del Colon/terapia , Cooperación del Paciente/estadística & datos numéricos , Vigilancia de la Población/métodos , Anciano , Anciano de 80 o más Años , Investigación sobre la Eficacia Comparativa , Femenino , Humanos , Masculino , Estadificación de Neoplasias , Estudios Retrospectivos , Programa de VERF , Análisis de Supervivencia

8.

Model-based clustering with certainty estimation: implication for clade assignment of influenza viruses.

Zhang, Shunpu; Li, Zhong; Beland, Kevin; Lu, Guoqing.

BMC Bioinformatics ; 17: 287, 2016 Jul 21.

Artículo en Inglés | MEDLINE | ID: mdl-27439701

RESUMEN

BACKGROUND: Clustering is a common technique used by molecular biologists to group homologous sequences and study evolution. There remain issues such as how to cluster molecular sequences accurately and in particular how to evaluate the certainty of clustering results. RESULTS: We presented a model-based clustering method to analyze molecular sequences, described a subset bootstrap scheme to evaluate a certainty of the clusters, and showed an intuitive way using 3D visualization to examine clusters. We applied the above approach to analyze influenza viral hemagglutinin (HA) sequences. Nine clusters were estimated for high pathogenic H5N1 avian influenza, which agree with previous findings. The certainty for a given sequence that can be correctly assigned to a cluster was all 1.0 whereas the certainty for a given cluster was also very high (0.92-1.0), with an overall clustering certainty of 0.95. For influenza A H7 viruses, ten HA clusters were estimated and the vast majority of sequences could be assigned to a cluster with a certainty of more than 0.99. The certainties for clusters, however, varied from 0.40 to 0.98; such certainty variation is likely attributed to the heterogeneity of sequence data in different clusters. In both cases, the certainty values estimated using the subset bootstrap method are all higher than those calculated based upon the standard bootstrap method, suggesting our bootstrap scheme is applicable for the estimation of clustering certainty. CONCLUSIONS: We formulated a clustering analysis approach with the estimation of certainties and 3D visualization of sequence data. We analysed 2 sets of influenza A HA sequences and the results indicate our approach was applicable for clustering analysis of influenza viral sequences.

Asunto(s)

Subtipo H5N1 del Virus de la Influenza A/clasificación , Modelos Teóricos , Animales , Secuencia de Bases , Aves , Análisis por Conglomerados , Hemaglutininas Virales/química , Subtipo H5N1 del Virus de la Influenza A/metabolismo , Gripe Aviar/virología , Filogenia

9.

A powerful method for combining P-values in genomic studies.

Chen, Huann-Sheng; Pfeiffer, Ruth M; Zhang, Shunpu.

Genet Epidemiol ; 37(8): 814-9, 2013 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-23959976

RESUMEN

After genetic regions have been identified in genomewide association studies (GWAS), investigators often follow up with more targeted investigations of specific regions. These investigations typically are based on single nucleotide polymorphisms (SNPs) with dense coverage of a region. Methods are thus needed to test the hypothesis of any association in given genetic regions. Several approaches for combining P-values obtained from testing individual SNP hypothesis tests are available. We recently proposed a sequential procedure for testing the global null hypothesis of no association in a region. When this global null hypothesis is rejected, this method provides a list of significant hypotheses and has weak control of the family-wise error rate. In this paper, we devise a permutation-based version of the test that accounts for correlations of tests based on SNPs in the same genetic region. Based on simulated data, the method has correct control of the type I error rate and higher or comparable power to other tests.

Asunto(s)

Estudio de Asociación del Genoma Completo , Genómica , Algoritmos , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Proyectos de Investigación

10.

Confidence intervals for ranks of age-adjusted rates across states or counties.

Zhang, Shunpu; Luo, Jun; Zhu, Li; Stinchcomb, David G; Campbell, Dave; Carter, Ginger; Gilkeson, Scott; Feuer, Eric J.

Stat Med ; 33(11): 1853-66, 2014 May 20.

Artículo en Inglés | MEDLINE | ID: mdl-24420973

RESUMEN

Health indices provide information to the general public on the health condition of the community. They can also be used to inform the government's policy making, to evaluate the effect of a current policy or healthcare program, or for program planning and priority setting. It is a common practice that the health indices across different geographic units are ranked and the ranks are reported as fixed values. We argue that the ranks should be viewed as random and hence should be accompanied by an indication of precision (i.e., the confidence intervals). A technical difficulty in doing so is how to account for the dependence among the ranks in the construction of confidence intervals. In this paper, we propose a novel Monte Carlo method for constructing the individual and simultaneous confidence intervals of ranks for age-adjusted rates. The proposed method uses as input age-specific counts (of cases of disease or deaths) and their associated populations. We have further extended it to the case in which only the age-adjusted rates and confidence intervals are available. Finally, we demonstrate the proposed method to analyze US age-adjusted cancer incidence rates and mortality rates for cancer and other diseases by states and counties within a state using a website that will be publicly available. The results show that for rare or relatively rare disease (especially at the county level), ranks are essentially meaningless because of their large variability, while for more common disease in larger geographic units, ranks can be effectively utilized.

Asunto(s)

Teorema de Bayes , Intervalos de Confianza , Interpretación Estadística de Datos , Método de Montecarlo , Neoplasias/epidemiología , Factores de Edad , Algoritmos , Simulación por Computador , Humanos , Incidencia , Neoplasias/mortalidad , Estados Unidos

11.

Next generation analytic tools for large scale genetic epidemiology studies of complex diseases.

Mechanic, Leah E; Chen, Huann-Sheng; Amos, Christopher I; Chatterjee, Nilanjan; Cox, Nancy J; Divi, Rao L; Fan, Ruzong; Harris, Emily L; Jacobs, Kevin; Kraft, Peter; Leal, Suzanne M; McAllister, Kimberly; Moore, Jason H; Paltoo, Dina N; Province, Michael A; Ramos, Erin M; Ritchie, Marylyn D; Roeder, Kathryn; Schaid, Daniel J; Stephens, Matthew; Thomas, Duncan C; Weinberg, Clarice R; Witte, John S; Zhang, Shunpu; Zöllner, Sebastian; Feuer, Eric J; Gillanders, Elizabeth M.

Genet Epidemiol ; 36(1): 22-35, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22147673

RESUMEN

Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled "Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases" on September 15-16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized.

Asunto(s)

Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Epidemiología Molecular/métodos , Minería de Datos/métodos , Variación Genética , Humanos , National Institutes of Health (U.S.) , Neoplasias/genética , Fenotipo , Estados Unidos

12.

On correcting the overestimation of the permutation-based false discovery rate estimator.

Jiao, Shuo; Zhang, Shunpu.

Bioinformatics ; 24(15): 1655-61, 2008 Aug 01.

Artículo en Inglés | MEDLINE | ID: mdl-18573796

RESUMEN

MOTIVATION: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR), which is defined as the expected percentage of the number of false positive genes among the claimed significant genes. As a consequence, the accuracy of the FDR estimators will be important for correctly controlling FDR. Xie et al. found that the standard permutation method of estimating FDR is biased and proposed to delete the predicted differentially expressed (DE) genes in the estimation of FDR for one-sample comparison. However, we notice that the formula of the FDR used in their paper is incorrect. This makes the comparison results reported in their paper unconvincing. Other problems with their method include the biased estimation of FDR caused by over- or under-deletion of DE genes in the estimation of FDR and by the implicit use of an unreasonable estimator of the true proportion of equivalently expressed (EE) genes. Due to the great importance of accurate FDR estimation in microarray data analysis, it is necessary to point out such problems and propose improved methods. RESULTS: Our results confirm that the standard permutation method overestimates the FDR. With the correct FDR formula, we show the method of Xie et al. always gives biased estimation of FDR: it overestimates when the number of claimed significant genes is small, and underestimates when the number of claimed significant genes is large. To overcome these problems, we propose two modifications. The simulation results show that our estimator gives more accurate estimation.

Asunto(s)

Algoritmos , Artefactos , Interpretación Estadística de Datos , Reacciones Falso Positivas , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos

13.

An improved string composition method for sequence comparison.

Lu, Guoqing; Zhang, Shunpu; Fang, Xiang.

BMC Bioinformatics ; 9 Suppl 6: S15, 2008 May 28.

Artículo en Inglés | MEDLINE | ID: mdl-18541050

RESUMEN

BACKGROUND: Historically, two categories of computational algorithms (alignment-based and alignment-free) have been applied to sequence comparison-one of the most fundamental issues in bioinformatics. Multiple sequence alignment, although dominantly used by biologists, possesses both fundamental as well as computational limitations. Consequently, alignment-free methods have been explored as important alternatives in estimating sequence similarity. Of the alignment-free methods, the string composition vector (CV) methods, which use the frequencies of nucleotide or amino acid strings to represent sequence information, show promising results in genome sequence comparison of prokaryotes. The existing CV-based methods, however, suffer certain statistical problems, thereby underestimating the amount of evolutionary information in genetic sequences. RESULTS: We show that the existing string composition based methods have two problems, one related to the Markov model assumption and the other associated with the denominator of the frequency normalization equation. We propose an improved complete composition vector method under the assumption of a uniform and independent model to estimate sequence information contributing to selection for sequence comparison. Phylogenetic analyses using both simulated and experimental data sets demonstrate that our new method is more robust compared with existing counterparts and comparable in robustness with alignment-based methods. CONCLUSION: We observed two problems existing in the currently used string composition methods and proposed a new robust method for the estimation of evolutionary information of genetic sequences. In addition, we discussed that it might not be necessary to use relatively long strings to build a complete composition vector (CCV), due to the overlapping nature of vector strings with a variable length. We suggested a practical approach for the choice of an optimal string length to construct the CCV.

Asunto(s)

Algoritmos , ADN/química , ADN/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Datos de Secuencia Molecular

14.

Post-treatment surveillance testing of patients with colorectal cancer and the association with survival: protocol for a retrospective cohort study of the Surveillance, Epidemiology, and End Results (SEER)-Medicare database.

Hines, Robert B; Jiban, Md Jibanul Haque; Choudhury, Kanak; Loerzel, Victoria; Specogna, Adrian V; Troy, Steven P; Zhang, Shunpu.

BMJ Open ; 8(4): e022393, 2018 04 28.

Artículo en Inglés | MEDLINE | ID: mdl-29705770

RESUMEN

INTRODUCTION: Although the colorectal cancer (CRC) mortality rate has significantly improved over the past several decades, many patients will have a recurrence following curative treatment. Despite this high risk of recurrence, adherence to CRC surveillance testing guidelines is poor which increases cancer-related morbidity and potentially, mortality. Several randomised controlled trials (RCTs) with varying surveillance strategies have yielded conflicting evidence regarding the survival benefit associated with surveillance testing. However, due to differences in study protocols and limitations of sample size and length of follow-up, the RCT may not be the best study design to evaluate this relationship. An observational comparative effectiveness research study can overcome the sample size/follow-up limitations of RCT designs while assessing real-world variability in receipt of surveillance testing to provide much needed evidence on this important clinical issue. The gap in knowledge that this study will address concerns whether adherence to National Comprehensive Cancer Network CRC surveillance guidelines improves survival. METHODS AND ANALYSIS: Patients with colon and rectal cancer aged 66-84 years, who have been diagnosed between 2002 and 2008 and have been included in the Surveillance, Epidemiology, and End Results-Medicare database, are eligible for this retrospective cohort study. To minimise bias, patients had to survive at least 12 months following the completion of treatment. Adherence to surveillance testing up to 5 years post-treatment will be assessed in each year of follow-up and overall. Binomial regression will be used to assess the association between patients' characteristics and adherence. Survival analysis will be conducted to assess the association between adherence and 5-year survival. ETHICS AND DISSEMINATION: This study was approved by the National Cancer Institute and the Institutional Review Board of the University of Central Florida. The results of this study will be disseminated by publishing in the peer-reviewed scientific literature, presentation at national/international scientific conferences and posting through social media.

Asunto(s)

Neoplasias Colorrectales , Anciano , Anciano de 80 o más Años , Neoplasias Colorrectales/mortalidad , Neoplasias Colorrectales/terapia , Florida , Humanos , Medicare , Recurrencia Local de Neoplasia , Estudios Retrospectivos , Programa de VERF , Análisis de Supervivencia , Estados Unidos

15.

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance.

Zhang, Shunpu.

BMC Bioinformatics ; 8: 230, 2007 Jun 29.

Artículo en Inglés | MEDLINE | ID: mdl-17603887

RESUMEN

BACKGROUND: The Significance Analysis of Microarrays (SAM) is a popular method for detecting significantly expressed genes and controlling the false discovery rate (FDR). Recently, it has been reported in the literature that the FDR is not well controlled by SAM. Due to the vast application of SAM in microarray data analysis, it is of great importance to have an extensive evaluation of SAM and its associated R-package (sam2.20). RESULTS: Our study has identified several discrepancies between SAM and sam2.20. One major difference is that SAM and sam2.20 use different methods for estimating FDR. Such discrepancies may cause confusion among the researchers who are using SAM or are developing the SAM-like methods. We have also shown that SAM provides no meaningful estimates of FDR and this problem has been corrected in sam2.20 by using a different formula for estimating FDR. However, we have found that, even with the improvement sam2.20 has made over SAM, sam2.20 may still produce erroneous and even conflicting results under certain situations. Using an example, we show that the problem of sam2.20 is caused by its use of asymmetric cutoffs which are due to the large variability of null scores at both ends of the order statistics. An obvious approach without the complication of the order statistics is the conventional symmetric cutoff method. For this reason, we have carried out extensive simulations to compare the performance of sam2.20 and the symmetric cutoff method. Finally, a simple modification is proposed to improve the FDR estimation of sam2.20 and the symmetric cutoff method. CONCLUSION: Our study shows that the most serious drawback of SAM is its poor estimation of FDR. Although this drawback has been corrected in sam2.20, the control of FDR by sam2.20 is still not satisfactory. The comparison between sam2.20 and the symmetric cutoff method reveals that the relative performance of sam2.20 to the symmetric cutff method depends on the ratio of induced to repressed genes in a microarray data, and is also affected by the ratio of DE to EE genes and the distributions of induced and repressed genes. Numerical simulations show that the symmetric cutoff method has the biggest advantage over sam2.20 when there are equal number of induced and repressed genes (i.e., the ratio of induced to repressed genes is 1). As the ratio of induced to repressed genes moves away from 1, the advantage of the symmetric cutoff method to sam2.20 is gradually diminishing until eventually sam2.20 becomes significantly better than the symmetric cutoff method when the differentially expressed (DE) genes are either all induced or all repressed genes. Simulation results also show that our proposed simple modification provides improved control of FDR for both sam2.20 and the symmetric cutoff method.

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Validación de Programas de Computación , Programas Informáticos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

16.

An improved nonparametric approach for detecting differentially expressed genes with replicated microarray data.

Zhang, Shunpu.

Stat Appl Genet Mol Biol ; 5: Article30, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-17402914

RESUMEN

Previous nonparametric statistical methods on constructing the test and null statistics require having at least 4 arrays under each condition. In this paper, we provide an improved method of constructing the test and null statistics which only requires 2 arrays under one condition if the number of arrays under the other condition is at least 3. The conventional testing method defines the rejection region by controlling the probability of Type I error. In this paper, we propose to determine the critical values (or the cut-off points) of the rejection region by directly controlling the false discovery rate. Simulations were carried out to compare the performance of our proposed method with several existing methods. Finally, our proposed method is applied to the rat data of Pan et al. (2003). It is seen from both simulations and the rat data that our method has lower false discovery rates than those from the significance analysis of microarray (SAM) method of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2003).

Asunto(s)

Perfilación de la Expresión Génica/estadística & datos numéricos , Regulación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Estadísticas no Paramétricas , Algoritmos , Análisis de Varianza , Animales , ADN Bacteriano/genética , Perfilación de la Expresión Génica/métodos , Regulación Bacteriana de la Expresión Génica , Otitis Media con Derrame/microbiología , Infecciones Neumocócicas/microbiología , Ratas , Reproducibilidad de los Resultados , Distribuciones Estadísticas , Streptococcus pneumoniae/genética

17.

A new hybrid coding for protein secondary structure prediction based on primary structure similarity.

Li, Zhong; Wang, Jing; Zhang, Shunpu; Zhang, Qifeng; Wu, Wuming.

Gene ; 618: 8-13, 2017 Jun 30.

Artículo en Inglés | MEDLINE | ID: mdl-28322997

RESUMEN

The coding pattern of protein can greatly affect the prediction accuracy of protein secondary structure. In this paper, a novel hybrid coding method based on the physicochemical properties of amino acids and tendency factors is proposed for the prediction of protein secondary structure. The principal component analysis (PCA) is first applied to the physicochemical properties of amino acids to construct a 3-bit-code, and then the 3 tendency factors of amino acids are calculated to generate another 3-bit-code. Two 3-bit-codes are fused to form a novel hybrid 6-bit-code. Furthermore, we make a geometry-based similarity comparison of the protein primary structure between the reference set and the test set before the secondary structure prediction. We finally use the support vector machine (SVM) to predict those amino acids which are not detected by the primary structure similarity comparison. Experimental results show that our method achieves a satisfactory improvement in accuracy in the prediction of protein secondary structure.

Asunto(s)

Estructura Secundaria de Proteína , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Aminoácidos/química , Máquina de Vectores de Soporte

18.

Estimating the proportion of equivalently expressed genes in microarray data based on transformed test statistics.

Jiao, Shuo; Zhang, Shunpu.

J Comput Biol ; 17(2): 177-87, 2010 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-20078228

RESUMEN

In microarray data analysis, false discovery rate (FDR) is now widely accepted as the control criterion to account for multiple hypothesis testing. The proportion of equivalently expressed genes (pi(0)) is a key component to be estimated in the estimation of FDR. Some commonly used pi(0) estimators (BUM, SPLOSH, QVALUE, and LBE ) are all based on p-values, and they are essentially upper bounds of pi(0). The simulations we carried out show that these four methods significantly overestimate the true pi(0) when differentially expressed genes and equivalently expressed genes are not well separated. To solve this problem, we first introduce a novel way of transforming the test statistics to make them symmetric about 0. Then we propose a pi(0) estimator based on the transformed test statistics using the symmetry assumption. Real data application and simulation both show that the pi(0) estimate from our method is less conservative than BUM, SPLOSH, QVALUE, and LBE in most of the cases. Simulation results also show that our estimator always has the least mean squared error among these five methods.

Asunto(s)

Biomarcadores de Tumor/genética , Biología Computacional , Perfilación de la Expresión Génica , Leucemia Mieloide Aguda/genética , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Algoritmos , Biomarcadores de Tumor/metabolismo , Reacciones Falso Positivas , Humanos , Leucemia Mieloide Aguda/metabolismo , Reconocimiento de Normas Patrones Automatizadas , Leucemia-Linfoma Linfoblástico de Células Precursoras/metabolismo

19.

Probabilistic peak calling and controlling false discovery rate estimations in transcription factor binding site mapping from ChIP-seq.

Jiao, Shuo; Bailey, Cheryl P; Zhang, Shunpu; Ladunga, Istvan.

Methods Mol Biol ; 674: 161-77, 2010.

Artículo en Inglés | MEDLINE | ID: mdl-20827591

RESUMEN

Localizing the binding sites of regulatory proteins is becoming increasingly feasible and accurate. This is due to dramatic progress not only in chromatin immunoprecipitation combined by next-generation sequencing (ChIP-seq) but also in advanced statistical analyses. A fundamental issue, however, is the alarming number of false positive predictions. This problem can be remedied by improved peak calling methods of twin peaks, one at each strand of the DNA, kernel density estimators, and false discovery rate estimations based on control libraries. Predictions are filtered by de novo motif discovery in the peak environments. These methods have been implemented in, among others, Valouev et al.'s Quantitative Enrichment of Sequence Tags (QuEST) software tool. We demonstrate the prediction of the human growth-associated binding protein (GABPalpha) based on ChIP-seq observations.

Asunto(s)

Inmunoprecipitación de Cromatina , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo , Sitios de Unión , Reacciones Falso Positivas , Factor de Transcripción de la Proteína de Unión a GA/metabolismo , Humanos , Internet , Células Jurkat , Probabilidad , Secuencias Reguladoras de Ácidos Nucleicos/genética , Reproducibilidad de los Resultados , Programas Informáticos

20.

The t-mixture model approach for detecting differentially expressed genes in microarrays.

Jiao, Shuo; Zhang, Shunpu.

Funct Integr Genomics ; 8(3): 181-6, 2008 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-18210172

RESUMEN

The finite mixture model approach has attracted much attention in analyzing microarray data due to its robustness to the excessive variability which is common in the microarray data. Pan (2003) proposed to use the normal mixture model method (MMM) to estimate the distribution of a test statistic and its null distribution. However, considering the fact that the test statistic is often of t-type, our studies find that the rejection region from MMM is often significantly larger than the correct rejection region, resulting an inflated type I error. This motivates us to propose the t-mixture model (TMM) approach. In this paper, we demonstrate that TMM provides significantly more accurate control of the probability of making type I errors (hence of the familywise error rate) than MMM. Finally, TMM is applied to the well-known leukemia data of Golub et al. (1999). The results are compared with those obtained from MMM.

Asunto(s)

Expresión Génica , Modelos Genéticos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Simulación por Computador , Genoma Humano , Humanos , Leucemia Mieloide/genética , Funciones de Verosimilitud , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA