|

1.

FAIR privacy-preserving operation of large genomic variant calling format (VCF) data without download or installation.

Martins, Yasmmin C; Bhawsar, Praphulla Ms; Balasubramanian, Jeya B; Russ, Daniel; Wong, Wendy Sw; Maass, Wolfgang; Almeida, Jonas S.

AMIA Jt Summits Transl Sci Proc ; 2024: 65-74, 2024.

Article En | MEDLINE | ID: mdl-38827109

Motivation: The proliferation of genetic testing and consumer genomics represents a logistic challenge to the personalized use of GWAS data in VCF format. Specifically, the challenge of retrieving target genetic variation from large compressed files filled with unrelated variation information. Compounding the data traversal challenge, privacy-sensitive VCF files are typically managed as large stand-alone single files (no companion index file) composed of variable-sized compressed chunks, hosted in consumer-facing environments with no native support for hosted execution. Results: A portable JavaScript module was developed to support in-browser fetching of partial content using byte-range requests. This includes on-the-fly decompressing irregularly positioned compressed chunks, coupled with a binary search algorithm iteratively identifying chromosome-position ranges. The in-browser zero-footprint solution (no downloads, no installations) enables the interoperability, reusability, and user-facing governance advanced by the FAIR principles for stewardship of scientific data. Availability - https://episphere.github.io/vcf, including supplementary material.

2.

Associations between actigraphy-measured sleep duration, continuity, and timing with mortality in the UK Biobank.

Saint-Maurice, Pedro F; Freeman, Joshua R; Russ, Daniel; Almeida, Jonas S; Shams-White, Marissa M; Patel, Shreya; Wolff-Hughes, Dana L; Watts, Eleanor L; Loftfield, Erikka; Hong, Hyokyoung G; Moore, Steven C; Matthews, Charles E.

Sleep ; 47(3)2024 Mar 11.

Article En | MEDLINE | ID: mdl-38066693

STUDY OBJECTIVES: To examine the associations between sleep duration, continuity, timing, and mortality using actigraphy among adults. METHODS: Data were from a cohort of 88 282 adults (40-69 years) in UK Biobank that wore a wrist-worn triaxial accelerometer for 7 days. Actigraphy data were processed to generate estimates of sleep duration and other sleep characteristics including wake after sleep onset (WASO), number of 5-minute awakenings, and midpoint for sleep onset/wake-up and the least active 5 hours (L5). Data were linked to mortality outcomes with follow-up to October 31, 2021. We implemented Cox models (hazard ratio, confidence intervals [HR, 95% CI]) to quantify sleep associations with mortality. Models were adjusted for demographics, lifestyle factors, and medical conditions. RESULTS: Over an average of 6.8 years 2973 deaths occurred (1700 cancer, 586 CVD deaths). Overall sleep duration was significantly associated with risk for all-cause (pâ<â0.01), cancer (pâ<â0.01), and CVD (pâ=â0.03) mortality. For example, when compared to sleep durations of 7.0 hrs/d, durations of 5 hrs/d were associated with a 29% higher risk for all-cause mortality (HR: 1.29 [1.09, 1.52]). WASO and number of awakenings were not associated with mortality. Individuals with L5 early or late midpoints (<2:30 orâ≥â3:30) had a ~20% higher risk for all-cause mortality, compared to those with intermediate L5 midpoints (3:00-3:29; pâ≤â0.01; e.g. HR ≥ 3:30: 1.19 [1.07, 1.32]). CONCLUSIONS: Shorter sleep duration and both early and late sleep timing were associated with a higher mortality risk. These findings reinforce the importance of public health efforts to promote healthy sleep patterns in adults.

Cardiovascular Diseases , Neoplasms , Adult , Humans , Actigraphy , Sleep Duration , UK Biobank , Biological Specimen Banks , Sleep

3.

Actigraphy-derived measures of sleep and risk of prostate cancer in the UK Biobank.

Freeman, Joshua R; Saint-Maurice, Pedro F; Watts, Eleanor L; Moore, Steven C; Shams-White, Marissa M; Wolff-Hughes, Dana L; Russ, Daniel E; Almeida, Jonas S; Caporaso, Neil E; Hong, Hyokyoung G; Loftfield, Erikka; Matthews, Charles E.

J Natl Cancer Inst ; 116(3): 434-444, 2024 Mar 07.

Article En | MEDLINE | ID: mdl-38013591

BACKGROUND: Studies of sleep and prostate cancer are almost entirely based on self-report, with limited research using actigraphy. Our goal was to evaluate actigraphy-measured sleep and prostate cancer and to expand on findings from prior studies of self-reported sleep. METHODS: We prospectively examined 34â260 men without a history of prostate cancer in the UK Biobank. Sleep characteristics were measured over 7 days using actigraphy. We calculated sleep duration, onset, midpoint, wake-up time, social jetlag (difference in weekend-weekday sleep midpoints), sleep efficiency (percentage of time spent asleep between onset and wake-up time), and wakefulness after sleep onset. Cox proportional hazards models were used to estimate covariate-adjusted hazards ratios (HRs) and 95% confidence intervals (CIs). RESULTS: Over 7.6 years, 1152 men were diagnosed with prostate cancer. Sleep duration was not associated with prostate cancer risk. Sleep midpoint earlier than 4:00 am was not associated with prostate cancer risk, though sleep midpoint of 5:00 am or later was suggestively associated with lower prostate cancer risk but had limited precision (earlier than 4:00 am vs 4:00-4:59 am HR = 1.00, 95% CI = 0.87 to 1.16; 5:00 am or later vs 4:00-4:59 am HR = 0.79, 95% CI = 0.57 to 1.10). Social jetlag was not associated with greater prostate cancer risk (1 to <2 hours vs <1 hour HR = 1.06, 95% CI = 0.89 to 1.25; ≥2 hours vs <1 hour HR = 0.90, 95% CI = 0.65 to 1.26). Compared with men who averaged less than 30 minutes of wakefulness after sleep onset per day, men with 60 minutes or more had a higher risk of prostate cancer (HR = 1.20, 95% CI = 1.00 to 1.43). CONCLUSIONS: Of the sleep characteristics studied, higher wakefulness after sleep onset-a measure of poor sleep quality-was associated with greater prostate cancer risk. Replication of our findings between wakefulness after sleep onset and prostate cancer are warranted.

Actigraphy , Prostatic Neoplasms , Male , Humans , UK Biobank , Biological Specimen Banks , Sleep , Prostatic Neoplasms/epidemiology

4.

Quest markup for developing FAIR questionnaire modules for epidemiologic studies.

Russ, Daniel E; Gerlanc, Nicole M; Shen, Brian; Patel, Bhaumik; de González, Amy Berrington; Freedman, Neal D; Cusack, Julie M; Gaudet, Mia M; García-Closas, Montserrat; Almeida, Jonas S.

BMC Med Inform Decis Mak ; 23(1): 238, 2023 10 25.

Article En | MEDLINE | ID: mdl-37880712

BACKGROUND: Online questionnaires are commonly used to collect information from participants in epidemiological studies. This requires building questionnaires using machine-readable formats that can be delivered to study participants using web-based technologies such as progressive web applications. However, the paucity of open-source markup standards with support for complex logic make collaborative development of web-based questionnaire modules difficult. This often prevents interoperability and reusability of questionnaire modules across epidemiological studies. RESULTS: We developed an open-source markup language for presentation of questionnaire content and logic, Quest, within a real-time renderer that enables the user to test logic (e.g., skip patterns) and view the structure of data collection. We provide the Quest markup language, an in-browser markup rendering tool, questionnaire development tool and an example web application that embeds the renderer, developed for The Connect for Cancer Prevention Study. CONCLUSION: A markup language can specify both the content and logic of a questionnaire as plain text. Questionnaire markup, such as Quest, can become a standard format for storing questionnaires or sharing questionnaires across the web. Quest is a step towards generation of FAIR data in epidemiological studies by facilitating reusability of questionnaires and data interoperability using open-source tools.

Software , Humans , Surveys and Questionnaires , Epidemiologic Studies

5.

epiDonate - distributed serverless data infrastructure for epidemiological studies.

Almeida, Jonas S; Patel, Bhaumik; Russ, Daniel E; Bhawsar, Praphulla; Maurice, Pedro F Saint-; Matthews, Charles; Anand, Adit; Ferguson, Martin; Johnson, Davin; Brotzman, Michelle; Gerlanc, Nicole; Chanock, Stephen; de Gonzalez, Amy Berrington; Gaudet, Mia; Garcia-Closas, Montserrat.

AMIA Jt Summits Transl Sci Proc ; 2023: 25-31, 2023.

Article En | MEDLINE | ID: mdl-37350888

Motivation: Epidemiological studies face two important challenges: the need to ingest ever more complex data types, and mounting concerns about participant privacy and data governance. These two challenges are compounded by the expectation that data infrastructure will eventually need to facilitate cross-registration of participants by multiple epidemiological studies. Implementation: The portable web-service epiDonate was developed using the serverless model known as FaaS (Function-as-a-Service). The reference implementation uses nodejs. The implementation relies on a simple tokenization scheme, mediated by a public API, that a) distinguishes admin from participant roles, with b) extensible permission configuration operating a read/write structure. General Features: The critical design feature of epiDonate is the absence of business logic on the server-side (the web service). The simplicity removes the need to customize virtual machines and enables ecosystems of multiple web Applications backed by one or more data donation deployments. Availability: https://episphere.github.io/donate.

6.

Evaluation of the updated SOCcer v2 algorithm for coding free-text job descriptions in three epidemiologic studies.

Russ, Daniel E; Josse, Pabitra; Remen, Thomas; Hofmann, Jonathan N; Purdue, Mark P; Siemiatycki, Jack; Silverman, Debra T; Zhang, Yawei; Lavoué, Jerome; Friesen, Melissa C.

Ann Work Expo Health ; 67(6): 772-783, 2023 07 06.

Article En | MEDLINE | ID: mdl-37071789

OBJECTIVES: Computer-assisted coding of job descriptions to standardized occupational classification codes facilitates evaluating occupational risk factors in epidemiologic studies by reducing the number of jobs needing expert coding. We evaluated the performance of the 2nd version of SOCcer, a computerized algorithm designed to code free-text job descriptions to US SOC-2010 system based on free-text job titles and work tasks, to evaluate its accuracy. METHODS: SOCcer v2 was updated by expanding the training data to include jobs from several epidemiologic studies and revising the algorithm to account for nonlinearity and incorporate interactions. We evaluated the agreement between codes assigned by experts and the highest scoring code (a measure of confidence in the algorithm-predicted assignment) from SOCcer v1 and v2 in 14,714 jobs from three epidemiology studies. We also linked exposure estimates for 258 agents in the job-exposure matrix CANJEM to the expert and SOCcer v2-assigned codes and compared those estimates using kappa and intraclass correlation coefficients. Analyses were stratified by SOCcer score, score distance between the top two scoring codes from SOCcer, and features from CANJEM. RESULTS: SOCcer's v2 agreement at the 6-digit level was 50%, compared to 44% in v1, and was similar for the three studies (38%-45%). Overall agreement for v2 at the 2-, 3-, and 5-digit was 73%, 63%, and 56%, respectively. For v2, median ICCs for the probability and intensity metrics were 0.67 (IQR 0.59-0.74) and 0.56 (IQR 0.50-0.60), respectively. The agreement between the expert and SOCcer assigned codes linearly increased with SOCcer score. The agreement also improved when the top two scoring codes had larger differences in score. CONCLUSIONS: Overall agreement with SOCcer v2 applied to job descriptions from North American epidemiologic studies was similar to the agreement usually observed between two experts. SOCcer's score predicted agreement with experts and can be used to prioritize jobs for expert review.

Occupational Exposure , Soccer , Humans , Job Description , Occupational Exposure/analysis , Epidemiologic Studies , Algorithms

7.

Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison.

Wan, Wenxin; Ge, Calvin B; Friesen, Melissa C; Locke, Sarah J; Russ, Daniel E; Burstyn, Igor; Baker, Christopher J O; Adisesh, Anil; Lan, Qing; Rothman, Nathaniel; Huss, Anke; van Tongeren, Martie; Vermeulen, Roel; Peters, Susan.

Ann Work Expo Health ; 67(5): 663-672, 2023 06 06.

Article En | MEDLINE | ID: mdl-36734402

OBJECTIVES: Automatic job coding tools were developed to reduce the laborious task of manually assigning job codes based on free-text job descriptions in census and survey data sources, including large occupational health studies. The objective of this study is to provide a case study of comparative performance of job coding and JEM (Job-Exposure Matrix)-assigned exposures agreement using existing coding tools. METHODS: We compared three automatic job coding tools [AUTONOC, CASCOT (Computer-Assisted Structured Coding Tool), and LabourR], which were selected based on availability, coding of English free-text into coding systems closely related to the 1988 version of the International Standard Classification of Occupations (ISCO-88), and capability to perform batch coding. We used manually coded job histories from the AsiaLymph case-control study that were translated into English prior to auto-coding to assess their performance. We applied two general population JEMs to assess agreement at exposure level. Percent agreement and PABAK (Prevalence-Adjusted Bias-Adjusted Kappa) were used to compare the agreement of results from manual coders and automatic coding tools. RESULTS: The coding per cent agreement among the three tools ranged from 17.7 to 26.0% for exact matches at the most detailed 4-digit ISCO-88 level. The agreement was better at a more general level of job coding (e.g. 43.8-58.1% in 1-digit ISCO-88), and in exposure assignments (median values of PABAK coefficient ranging 0.69-0.78 across 12 JEM-assigned exposures). Based on our testing data, CASCOT was found to outperform others in terms of better agreement in both job coding (26% 4-digit agreement) and exposure assignment (median kappa 0.61). CONCLUSIONS: In this study, we observed that agreement on job coding was generally low for the three tools but noted a higher degree of agreement in assigned exposures. The results indicate the need for study-specific evaluations prior to their automatic use in general population studies, as well as improvements in the evaluated automatic coding tools.

Job Description , Occupational Exposure , Humans , Case-Control Studies , Occupations , Surveys and Questionnaires

8.

Moving Toward Findable, Accessible, Interoperable, Reusable Practices in Epidemiologic Research.

García-Closas, Montserrat; Ahearn, Thomas U; Gaudet, Mia M; Hurson, Amber N; Balasubramanian, Jeya Balaji; Choudhury, Parichoy Pal; Gerlanc, Nicole M; Patel, Bhaumik; Russ, Daniel; Abubakar, Mustapha; Freedman, Neal D; Wong, Wendy S W; Chanock, Stephen J; Berrington de Gonzalez, Amy; Almeida, Jonas S.

Am J Epidemiol ; 192(6): 995-1005, 2023 06 02.

Article En | MEDLINE | ID: mdl-36804665

Data sharing is essential for reproducibility of epidemiologic research, replication of findings, pooled analyses in consortia efforts, and maximizing study value to address multiple research questions. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of data sharing. Epidemiological practices that follow Findable, Accessible, Interoperable, Reusable (FAIR) principles can address these barriers by making data resources findable with the necessary metadata, accessible to authorized users, and interoperable with other data, to optimize the reuse of resources with appropriate credit to its creators. We provide an overview of these principles and describe approaches for implementation in epidemiology. Increasing degrees of FAIRness can be achieved by moving data and code from on-site locations to remote, accessible ("Cloud") data servers, using machine-readable and nonproprietary files, and developing open-source code. Adoption of these practices will improve daily work and collaborative analyses and facilitate compliance with data sharing policies from funders and scientific journals. Achieving a high degree of FAIRness will require funding, training, organizational support, recognition, and incentives for sharing research resources, both data and code. However, these costs are outweighed by the benefits of making research more reproducible, impactful, and equitable by facilitating the reuse of precious research resources by the scientific community.

Confidentiality , Information Dissemination , Humans , Reproducibility of Results , Software , Epidemiologic Studies

9.

Author Correction: A harmonized atlas of mouse spinal cord cell types and their spatial organization.

Russ, Daniel E; Cross, Ryan B Patterson; Li, Li; Koch, Stephanie C; Matson, Kaya J E; Yadav, Archana; Alkaslasi, Mor R; Lee, Dylan I; Le Pichon, Claire E; Menon, Vilas; Levine, Ariel J.

Nat Commun ; 13(1): 6184, 2022 Oct 19.

Article En | MEDLINE | ID: mdl-36261425

10.

Single cell atlas of spinal cord injury in mice reveals a pro-regenerative signature in spinocerebellar neurons.

Matson, Kaya J E; Russ, Daniel E; Kathe, Claudia; Hua, Isabelle; Maric, Dragan; Ding, Yi; Krynitsky, Jonathan; Pursley, Randall; Sathyamurthy, Anupama; Squair, Jordan W; Levi, Boaz P; Courtine, Gregoire; Levine, Ariel J.

Nat Commun ; 13(1): 5628, 2022 09 26.

Article En | MEDLINE | ID: mdl-36163250

After spinal cord injury, tissue distal to the lesion contains undamaged cells that could support or augment recovery. Targeting these cells requires a clearer understanding of their injury responses and capacity for repair. Here, we use single nucleus RNA sequencing to profile how each cell type in the lumbar spinal cord changes after a thoracic injury in mice. We present an atlas of these dynamic responses across dozens of cell types in the acute, subacute, and chronically injured spinal cord. Using this resource, we find rare spinal neurons that express a signature of regeneration in response to injury, including a major population that represent spinocerebellar projection neurons. We characterize these cells anatomically and observed axonal sparing, outgrowth, and remodeling in the spinal cord and cerebellum. Together, this work provides a key resource for studying cellular responses to injury and uncovers the spontaneous plasticity of spinocerebellar neurons, uncovering a potential candidate for targeted therapy.

Spinal Cord Injuries , Animals , Axons/metabolism , Cerebellum/metabolism , Mice , Nerve Regeneration/physiology , Neurons/metabolism , Spinal Cord/metabolism , Spinal Cord Injuries/pathology

11.

PLCOjs, a FAIR GWAS web SDK for the NCI Prostate, Lung, Colorectal and Ovarian Cancer Genetic Atlas project.

Ruan, Eric; Nemeth, Erika; Moffitt, Richard; Sandoval, Lorena; Machiela, Mitchell J; Freedman, Neal D; Huang, Wen-Yi; Wong, Wendy; Chen, Kai-Ling; Park, Brian; Jiang, Kevin; Hicks, Belynda; Liu, Jia; Russ, Daniel; Minasian, Lori; Pinsky, Paul; Chanock, Stephen J; Garcia-Closas, Montserrat; Almeida, Jonas S.

Bioinformatics ; 38(18): 4434-4436, 2022 09 15.

Article En | MEDLINE | ID: mdl-35900159

MOTIVATION: The Division of Cancer Epidemiology and Genetics (DCEG) and the Division of Cancer Prevention (DCP) at the National Cancer Institute (NCI) have recently generated genome-wide association study (GWAS) data for multiple traits in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Genomic Atlas project. The GWAS included 110 000 participants. The dissemination of the genetic association data through a data portal called GWAS Explorer, in a manner that addresses the modern expectations of FAIR reusability by data scientists and engineers, is the main motivation for the development of the open-source JavaScript software development kit (SDK) reported here. RESULTS: The PLCO GWAS Explorer resource relies on a public stateless HTTP application programming interface (API) deployed as the sole backend service for both the landing page's web application and third-party analytical workflows. The core PLCOjs SDK is mapped to each of the API methods, and also to each of the reference graphic visualizations in the GWAS Explorer. A few additional visualization methods extend it. As is the norm with web SDKs, no download or installation is needed and modularization supports targeted code injection for web applications, reactive notebooks (Observable) and node-based web services. AVAILABILITY AND IMPLEMENTATION: code at https://github.com/episphere/plco; project page at https://episphere.github.io/plco.

Colorectal Neoplasms , Ovarian Neoplasms , United States , Male , Humans , Female , Genome-Wide Association Study , National Cancer Institute (U.S.) , Prostate , Software , Ovarian Neoplasms/genetics , Lung

12.

Author Correction: A harmonized atlas of mouse spinal cord cell types and their spatial organization.

Russ, Daniel E; Cross, Ryan B Patterson; Li, Li; Koch, Stephanie C; Matson, Kaya J E; Yadav, Archana; Alkaslasi, Mor R; Lee, Dylan I; Le Pichon, Claire E; Menon, Vilas; Levine, Ariel J.

Nat Commun ; 13(1): 1033, 2022 Feb 18.

Article En | MEDLINE | ID: mdl-35181658

13.

A harmonized atlas of mouse spinal cord cell types and their spatial organization.

Russ, Daniel E; Cross, Ryan B Patterson; Li, Li; Koch, Stephanie C; Matson, Kaya J E; Yadav, Archana; Alkaslasi, Mor R; Lee, Dylan I; Le Pichon, Claire E; Menon, Vilas; Levine, Ariel J.

Nat Commun ; 12(1): 5722, 2021 09 29.

Article En | MEDLINE | ID: mdl-34588430

Single-cell RNA sequencing data can unveil the molecular diversity of cell types. Cell type atlases of the mouse spinal cord have been published in recent years but have not been integrated together. Here, we generate an atlas of spinal cell types based on single-cell transcriptomic data, unifying the available datasets into a common reference framework. We report a hierarchical structure of postnatal cell type relationships, with location providing the highest level of organization, then neurotransmitter status, family, and finally, dozens of refined populations. We validate a combinatorial marker code for each neuronal cell type and map their spatial distributions in the adult spinal cord. We also show complex lineage relationships among postnatal cell types. Additionally, we develop an open-source cell type classifier, SeqSeek, to facilitate the standardization of cell type identification. This work provides an integrated view of spinal cell types, their gene expression signatures, and their molecular organization.

Neurons/classification , Spinal Cord/cytology , Transcriptome , Animals , Atlases as Topic , Cell Nucleus/genetics , Datasets as Topic , Mice , Neurons/cytology , RNA-Seq , Single-Cell Analysis , Spatial Analysis , Spinal Cord/growth & development

14.

Simultaneous modeling of detection rate and exposure concentration using semi-continuous models to identify exposure determinants when left-censored data may be a true zero.

Friesen, Melissa C; Choo-Wosoba, Hyoyoung; Sarazin, Philippe; Hwang, Jooyeon; Dopart, Pamela; Russ, Daniel E; Deziel, Nicole C; Lavoué, Jérôme; Albert, Paul S; Zhu, Bin.

J Expo Sci Environ Epidemiol ; 31(6): 1047-1056, 2021 11.

Article En | MEDLINE | ID: mdl-34006962

BACKGROUND: Most methods for treating left-censored data assume the analyte is present but not quantified. Biased estimates may result if the analyte is absent such that the unobserved data represents a mixed exposure distribution with an unknown proportion clustered at zero. OBJECTIVE: We used semi-continuous models to identify time and industry trends in 52,457 OSHA inspection lead sample results. METHOD: The first component of the semi-continuous model predicted the probability of detecting concentrations ≥ 0.007 mg/m3 (highest estimated detection limit, 62% of measurements). The second component predicted the median concentration of measurements ≥ 0.007 mg/m3. Both components included a random-effect for industry and fixed-effects for year, industry group, analytical method, and other variables. We used the two components together to predict median industry- and time-specific lead concentrations. RESULTS: The probabilities of detectable concentrations and the median detected concentrations decreased with year; both were also lower for measurements analyzed for multiple (vs. one) metals and for those analyzed by inductively-coupled plasma (vs. atomic absorption spectroscopy). The covariance was 0.30 (standard error = 0.06), confirming the two components were correlated. SIGNIFICANCE: We identified determinants of exposure in data with over 60% left-censored, while accounting for correlated relationships and without assuming a distribution for the censored data.

Models, Statistical , Occupational Exposure , Humans , Industry , Lead , Occupational Exposure/analysis

15.

Smoking status, usual adult occupation, and risk of recurrent urothelial bladder carcinoma: data from The Cancer Genome Atlas (TCGA) Project.

Wilcox, Amber N; Silverman, Debra T; Friesen, Melissa C; Locke, Sarah J; Russ, Daniel E; Hyun, Noorie; Colt, Joanne S; Figueroa, Jonine D; Rothman, Nathaniel; Moore, Lee E; Koutros, Stella.

Cancer Causes Control ; 27(12): 1429-1435, 2016 Dec.

Article En | MEDLINE | ID: mdl-27804056

PURPOSE: Tobacco smoking and occupational exposures are the leading risk factors for developing urothelial bladder carcinoma (UBC), yet little is known about the contribution of these two factors to risk of UBC recurrence. We evaluated whether smoking status and usual adult occupation are associated with time to UBC recurrence for 406 patients with muscle-invasive bladder cancer submitted to The Cancer Genome Atlas (TCGA) project. METHODS: Kaplan-Meier and Cox proportional hazard methods were used to assess the association between smoking status, employment in a high-risk occupation for bladder cancer, occupational diesel exhaust exposure, and 2010 Standard Occupational Classification group and time to UBC recurrence. RESULTS: Data on time to recurrence were available for 358 patients over a median follow-up time of 15 months. Of these, 133 (37.2%) experienced a recurrence. Current smokers who smoked for more than 40 pack-years had an increased risk of recurrence compared to never smokers (HR 2.1, 95% CI 1.1, 4.1). Additionally, employment in a high-risk occupation was associated with a shorter time to recurrence (log-rank p = 0.005). We found an increased risk of recurrence for those employed in occupations with probable diesel exhaust exposure (HR 1.8, 95% CI 1.1, 3.0) and for those employed in production occupations (HR 2.0, 95% CI 1.1, 3.6). CONCLUSIONS: These findings suggest smoking status impacts risk of UBC recurrence, although several previous studies provided equivocal evidence regarding this association. In addition to the known causal relationship between occupational exposure and bladder cancer risk, our study suggests that occupation may also be related to increased risk of recurrence.

Neoplasm Recurrence, Local/epidemiology , Occupational Exposure/statistics & numerical data , Occupations/statistics & numerical data , Smoking/epidemiology , Urinary Bladder Neoplasms/epidemiology , Aged , Female , Humans , Male , Middle Aged , Neoplasm Recurrence, Local/genetics , Neoplasm Recurrence, Local/pathology , Risk Factors , Smoking/adverse effects , Smoking/genetics , Smoking/pathology , United States/epidemiology , Urinary Bladder Neoplasms/genetics , Urinary Bladder Neoplasms/pathology

16.

Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.

Russ, Daniel E; Ho, Kwan-Yuet; Colt, Joanne S; Armenti, Karla R; Baris, Dalsu; Chow, Wong-Ho; Davis, Faith; Johnson, Alison; Purdue, Mark P; Karagas, Margaret R; Schwartz, Kendra; Schwenn, Molly; Silverman, Debra T; Johnson, Calvin A; Friesen, Melissa C.

Occup Environ Med ; 73(6): 417-24, 2016 Jun.

Article En | MEDLINE | ID: mdl-27102331

BACKGROUND: Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. METHODS: Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14â983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in 2 occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. RESULTS: For 11â991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6-digit and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (κ 0.6-0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. CONCLUSIONS: Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiological studies.

Industry/classification , Job Description , Natural Language Processing , Occupations/classification , Algorithms , Carcinoma, Renal Cell , Case-Control Studies , Epidemiologic Methods , Epidemiologic Studies , Humans , Logistic Models , Reproducibility of Results , Software , United States , United States Occupational Safety and Health Administration

17.

HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ, Daniel E; Ho, Kwan-Yuet; Longo, Nancy S.

BMC Bioinformatics ; 16: 170, 2015 May 23.

Article En | MEDLINE | ID: mdl-26001675

BACKGROUND: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing. RESULTS: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool. CONCLUSIONS: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Algorithms , Computational Biology/methods , Gene Rearrangement , Immunoglobulin Joining Region/genetics , Immunoglobulin Variable Region/genetics , Mutation/genetics , Software , Base Sequence , Conserved Sequence , Humans , Molecular Sequence Data

18.

Computer-Based Coding of Occupation Codes for Epidemiological Analyses.

Russ, Daniel E; Ho, Kwan-Yuet; Johnson, Calvin A; Friesen, Melissa C.

Proc IEEE Int Symp Comput Based Med Syst ; 2014: 347-350, 2014 May.

Article En | MEDLINE | ID: mdl-25221787

Mapping job titles to standardized occupation classification (SOC) codes is an important step in evaluating changes in health risks over time as measured in inspection databases. However, manual SOC coding is cost prohibitive for very large studies. Computer based SOC coding systems can improve the efficiency of incorporating occupational risk factors into large-scale epidemiological studies. We present a novel method of mapping verbatim job titles to SOC codes using a large table of prior knowledge available in the public domain that included detailed description of the tasks and activities and their synonyms relevant to each SOC code. Job titles are compared to our knowledge base to find the closest matching SOC code. A soft Jaccard index is used to measure the similarity between a previously unseen job title and the knowledge base. Additional information such as standardized industrial codes can be incorporated to improve the SOC code determination by providing additional context to break ties in matches.

19.

Characterization of germline antibody libraries from human umbilical cord blood and selection of monoclonal antibodies to viral envelope glycoproteins: Implications for mechanisms of immune evasion and design of vaccine immunogens.

Chen, Weizao; Streaker, Emily D; Russ, Daniel E; Feng, Yang; Prabakaran, Ponraj; Dimitrov, Dimiter S.

Biochem Biophys Res Commun ; 417(4): 1164-9, 2012 Jan 27.

Article En | MEDLINE | ID: mdl-22226962

We have previously observed that all known HIV-1 broadly neutralizing antibodies (bnAbs) are highly divergent from germline antibodies in contrast to bnAbs against Hendra virus, Nipah virus and SARS coronavirus (SARS CoV). We have hypothesized that because the germline antibodies are so different from the mature HIV-1-specific bnAbs they may not bind the epitopes of the mature antibodies and provided the first evidence to support this hypothesis by using individual putative germline-like predecessor antibodies. To further validate the hypothesis and understand initial immune responses to different viruses, two phage-displayed human cord blood-derived IgM libraries were constructed which contained mostly germline antibodies or antibodies with very low level of somatic hypermutations. They were panned against different HIV-1 envelope glycoproteins (Envs), SARS CoV protein receptor-binding domain (RBD), and soluble Hendra virus G protein (sG). Despite a high sequence and combinatorial diversity observed in the cord blood-derived IgM antibody repertoire, no enrichment for binders of Envs was observed in contrast to considerable specific enrichments produced with panning against RBD and sG; one of the selected monoclonal antibodies (against the RBD) was of high (nM) affinity with only few somatic mutations. These results further support and expand our initial hypothesis for fundamental differences in immune responses leading to elicitation of bnAbs against HIV-1 compared to SARS CoV and Hendra virus. HIV-1 uses a strategy to minimize or eliminate strong binding of germline antibodies to its Env; in contrast, SARS CoV and Hendra virus, and perhaps other viruses causing acute infections, can bind germline antibody or minimally somatically mutated antibodies with relatively high affinity which could be one of the reasons for the success of sG and RBD as vaccine immunogens.

Adaptive Immunity , Antibodies, Monoclonal/immunology , Antibodies, Monoclonal/isolation & purification , Antibodies, Neutralizing/immunology , Glycoproteins/immunology , HIV-1/immunology , Viral Envelope Proteins/immunology , Amino Acid Sequence , Antibodies, Neutralizing/genetics , Fetal Blood/immunology , Hendra Virus/immunology , Humans , Molecular Sequence Data , Peptide Library , Protein Structure, Tertiary , Severe acute respiratory syndrome-related coronavirus/immunology

20.

Analysis of somatic hypermutation in X-linked hyper-IgM syndrome shows specific deficiencies in mutational targeting.

Longo, Nancy S; Lugar, Patricia L; Yavuz, Sule; Zhang, Wen; Krijger, Peter H L; Russ, Daniel E; Jima, Dereje D; Dave, Sandeep S; Grammer, Amrie C; Lipsky, Peter E.

Blood ; 113(16): 3706-15, 2009 Apr 16.

Article En | MEDLINE | ID: mdl-19023113

Subjects with X-linked hyper-IgM syndrome (X-HIgM) have a markedly reduced frequency of CD27(+) memory B cells, and their Ig genes have a low level of somatic hypermutation (SHM). To analyze the nature of SHM in X-HIgM, we sequenced 209 nonproductive and 926 productive Ig heavy chain genes. In nonproductive rearrangements that were not subjected to selection, as well as productive rearrangements, most of the mutations were within targeted RGYW, WRCY, WA, or TW motifs (R = purine, Y = pyrimidine, and W = A or T). However, there was significantly decreased targeting of the hypermutable G in RGYW motifs. Moreover, the ratio of transitions to transversions was markedly increased compared with normal. Microarray analysis documented that specific genes involved in SHM, including activation-induced cytidine deaminase (AICDA) and uracil-DNA glycosylase (UNG2), were up-regulated in normal germinal center (GC) B cells, but not induced by CD40 ligation. Similar results were obtained from light chain rearrangements. These results indicate that in the absence of CD40-CD154 interactions, there is a marked reduction in SHM and, specifically, mutations of AICDA-targeted G residues in RGYW motifs along with a decrease in transversions normally related to UNG2 activity.

B-Lymphocytes/enzymology , Cytidine Deaminase/biosynthesis , DNA Glycosylases/biosynthesis , Gene Expression Regulation, Enzymologic/genetics , Hyper-IgM Immunodeficiency Syndrome, Type 1/genetics , Immunoglobulin Heavy Chains/genetics , Somatic Hypermutation, Immunoglobulin/genetics , Adolescent , Adult , B-Lymphocytes/immunology , CD40 Antigens/genetics , CD40 Antigens/immunology , CD40 Antigens/metabolism , CD40 Ligand/genetics , CD40 Ligand/immunology , CD40 Ligand/metabolism , Child , Cytidine Deaminase/genetics , Cytidine Deaminase/immunology , DNA Glycosylases/genetics , DNA Glycosylases/immunology , DNA Mutational Analysis , Gene Expression Regulation, Enzymologic/immunology , Germinal Center/enzymology , Germinal Center/immunology , Humans , Hyper-IgM Immunodeficiency Syndrome, Type 1/enzymology , Hyper-IgM Immunodeficiency Syndrome, Type 1/immunology , Immunoglobulin Heavy Chains/immunology , Immunologic Capping/genetics , Immunologic Capping/immunology , Immunologic Memory/genetics , Male , Mutation , Somatic Hypermutation, Immunoglobulin/immunology , Up-Regulation/genetics , Up-Regulation/immunology