Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Cancer Cell ; 37(4): 551-568.e14, 2020 04 13.
Article in English | MEDLINE | ID: mdl-32289277

ABSTRACT

The development of precision medicine approaches for diffuse large B cell lymphoma (DLBCL) is confounded by its pronounced genetic, phenotypic, and clinical heterogeneity. Recent multiplatform genomic studies revealed the existence of genetic subtypes of DLBCL using clustering methodologies. Here, we describe an algorithm that determines the probability that a patient's lymphoma belongs to one of seven genetic subtypes based on its genetic features. This classification reveals genetic similarities between these DLBCL subtypes and various indolent and extranodal lymphoma types, suggesting a shared pathogenesis. These genetic subtypes also have distinct gene expression profiles, immune microenvironments, and outcomes following immunochemotherapy. Functional analysis of genetic subtype models highlights distinct vulnerabilities to targeted therapy, supporting the use of this classification in precision medicine trials.


Subject(s)
Biomarkers, Tumor/genetics , Genetic Heterogeneity , Lymphoma, Large B-Cell, Diffuse/classification , Lymphoma, Large B-Cell, Diffuse/genetics , Molecular Targeted Therapy , Animals , Apoptosis , Cell Proliferation , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Lymphoma, Large B-Cell, Diffuse/drug therapy , Lymphoma, Large B-Cell, Diffuse/pathology , Mice , Mice, Inbred NOD , Mice, SCID , Precision Medicine , Tumor Cells, Cultured , Tumor Microenvironment , Xenograft Model Antitumor Assays
2.
Methods Mol Biol ; 1956: 283-303, 2019.
Article in English | MEDLINE | ID: mdl-30779040

ABSTRACT

High-throughput mRNA sequencing (RNA-Seq) provides both qualitative and quantitative evaluation of the transcriptome. This method uses complementary DNA (cDNA) to generate several millions of short sequence reads that are aligned to a reference genome allowing the comprehensive characterization of the transcripts in a cell. RNA-Seq has a wide variety of applications which lead to a pervasive adoption of this method well beyond the genomics community and a deployment of this technique as a standard part of the toolkit applied in life sciences. This chapter describes a protocol to perform mRNA sequencing using the Illumina NextSeq or MiSeq platforms, presents sequencing data quality metrics, and outlines a bioinformatic pipeline for sequence alignment, digital gene expression, identification of gene fusions, detection of transcript isoforms, description and annotation of genetic variants, and de novo immunoglobulin gene assembly.


Subject(s)
Genomics/methods , Lymphoma, B-Cell/genetics , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Alternative Splicing , Gene Expression Profiling/methods , Gene Fusion , Genes, Immunoglobulin , High-Throughput Nucleotide Sequencing/methods , Humans , Mutation , Polymorphism, Single Nucleotide , Software , Transcriptome
3.
N Engl J Med ; 378(15): 1396-1407, 2018 04 12.
Article in English | MEDLINE | ID: mdl-29641966

ABSTRACT

BACKGROUND: Diffuse large B-cell lymphomas (DLBCLs) are phenotypically and genetically heterogeneous. Gene-expression profiling has identified subgroups of DLBCL (activated B-cell-like [ABC], germinal-center B-cell-like [GCB], and unclassified) according to cell of origin that are associated with a differential response to chemotherapy and targeted agents. We sought to extend these findings by identifying genetic subtypes of DLBCL based on shared genomic abnormalities and to uncover therapeutic vulnerabilities based on tumor genetics. METHODS: We studied 574 DLBCL biopsy samples using exome and transcriptome sequencing, array-based DNA copy-number analysis, and targeted amplicon resequencing of 372 genes to identify genes with recurrent aberrations. We developed and implemented an algorithm to discover genetic subtypes based on the co-occurrence of genetic alterations. RESULTS: We identified four prominent genetic subtypes in DLBCL, termed MCD (based on the co-occurrence of MYD88L265P and CD79B mutations), BN2 (based on BCL6 fusions and NOTCH2 mutations), N1 (based on NOTCH1 mutations), and EZB (based on EZH2 mutations and BCL2 translocations). Genetic aberrations in multiple genes distinguished each genetic subtype from other DLBCLs. These subtypes differed phenotypically, as judged by differences in gene-expression signatures and responses to immunochemotherapy, with favorable survival in the BN2 and EZB subtypes and inferior outcomes in the MCD and N1 subtypes. Analysis of genetic pathways suggested that MCD and BN2 DLBCLs rely on "chronic active" B-cell receptor signaling that is amenable to therapeutic inhibition. CONCLUSIONS: We uncovered genetic subtypes of DLBCL with distinct genotypic, epigenetic, and clinical characteristics, providing a potential nosology for precision-medicine strategies in DLBCL. (Funded by the Intramural Research Program of the National Institutes of Health and others.).


Subject(s)
Gene Expression Profiling , Genetic Heterogeneity , Lymphoma, Large B-Cell, Diffuse/genetics , Mutation , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Biopsy , Epigenesis, Genetic , Exome , Genotype , Humans , Kaplan-Meier Estimate , Lymphoma, Large B-Cell, Diffuse/classification , Lymphoma, Large B-Cell, Diffuse/drug therapy , Lymphoma, Large B-Cell, Diffuse/mortality , Prognosis , Sequence Analysis, DNA , Transcriptome
4.
Occup Environ Med ; 73(6): 417-24, 2016 Jun.
Article in English | MEDLINE | ID: mdl-27102331

ABSTRACT

BACKGROUND: Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. METHODS: Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14 983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in 2 occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. RESULTS: For 11 991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6-digit and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (κ 0.6-0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. CONCLUSIONS: Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiological studies.


Subject(s)
Industry/classification , Job Description , Natural Language Processing , Occupations/classification , Algorithms , Carcinoma, Renal Cell , Case-Control Studies , Epidemiologic Methods , Epidemiologic Studies , Humans , Logistic Models , Reproducibility of Results , Software , United States , United States Occupational Safety and Health Administration
5.
IEEE Int Conf Bioinform Biomed Workshops ; 2015: 1586-1590, 2015 Nov.
Article in English | MEDLINE | ID: mdl-27042700

ABSTRACT

Longitudinal studies play a key role in various fields, including epidemiology, clinical research, and genomic analysis. Currently, the most popular methods in longitudinal data analysis are model-driven regression approaches, which impose strong prior assumptions and are unable to scale to large problems in the manner of machine learning algorithms. In this work, we propose a novel longitudinal support vector regression (LSVR) algorithm that not only takes the advantage of one of the most popular machine learning methods, but also is able to model the temporal nature of longitudinal data by taking into account observational dependence within subjects. We test LSVR on publicly available data from the DREAM-Phil Bowen ALS Prediction Prize4Life challenge. Results suggest that LSVR is at a minimum competitive with favored machine learning methods and is able to outperform those methods in predicting ALS score one month in advance.

6.
Article in English | MEDLINE | ID: mdl-25221787

ABSTRACT

Mapping job titles to standardized occupation classification (SOC) codes is an important step in evaluating changes in health risks over time as measured in inspection databases. However, manual SOC coding is cost prohibitive for very large studies. Computer based SOC coding systems can improve the efficiency of incorporating occupational risk factors into large-scale epidemiological studies. We present a novel method of mapping verbatim job titles to SOC codes using a large table of prior knowledge available in the public domain that included detailed description of the tasks and activities and their synonyms relevant to each SOC code. Job titles are compared to our knowledge base to find the closest matching SOC code. A soft Jaccard index is used to measure the similarity between a previously unseen job title and the knowledge base. Additional information such as standardized industrial codes can be incorporated to improve the SOC code determination by providing additional context to break ties in matches.

7.
Article in English | MEDLINE | ID: mdl-25599092

ABSTRACT

Research into modeling the progression of Alzheimer's disease (AD) has made recent progress in identifying plasma proteomic biomarkers to identify the disease at the pre-clinical stage. In contrast with cerebral spinal fluid (CSF) biomarkers and PET imaging, plasma biomarker diagnoses have the advantage of being cost-effective and minimally invasive, thereby improving our understanding of AD and hopefully leading to early interventions as research into this subject advances. The Alzheimer's Disease Neuroimaging Initiative* (ADNI) has collected data on 190 plasma analytes from individuals diagnosed with AD as well subjects with mild cognitive impairment and cognitively normal (CN) controls. We propose an approach to classify subjects as AD or CN via an ensemble of classifiers trained and validated on ADNI data. Classifier performance is enhanced by an augmentation of a selective biomarker feature space with principal components obtained from the entire set of biomarkers. This procedure yields accuracy of 89% and area under the ROC curve of 94%.

8.
Article in English | MEDLINE | ID: mdl-25621319

ABSTRACT

Lysosomes are subcellular organelles playing a vital role in the endocytosis process of the cell. Lysosomal acidity is an important factor in assuring proper functioning of the enzymes within the organelle, and can be assessed by labeling the lysosomes with pH-sensitive fluorescence probes. To enhance our understanding of the acidification mechanisms, the goal of this work is to develop a method that can accurately detect and characterize the acidity of each lysosome captured in ratiometric fluorescence images. We present an algorithm that utilizes the h-dome transformation and reconciles spots detected independently from two wavelength channels. We evaluated our algorithm using simulated images for which the exact locations were known. The h-dome algorithm achieved an f-score as high as 0.890. We also computed the fluorescence ratios from lysosomes in live HeLa cell images with known lysosomal pHs. Using leave-one-out cross-validation, we demonstrated that the new algorithm was able to achieve much better pH prediction accuracy than the conventional method.

9.
Int J Biomed Imaging ; 2009: 528639, 2009.
Article in English | MEDLINE | ID: mdl-19672315

ABSTRACT

Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

10.
Genome Biol ; 9 Suppl 2: S6, 2008.
Article in English | MEDLINE | ID: mdl-18834497

ABSTRACT

We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; http://bcms.bioinfo.cnio.es/). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations.


Subject(s)
Biomedical Research/methods , Computational Biology/methods , Information Storage and Retrieval , Internet , Humans
11.
Article in English | MEDLINE | ID: mdl-17951839

ABSTRACT

The ability to identify gene mentions in text and normalize them to the proper unique identifiers is crucial for "down-stream" text mining applications in bioinformatics. We have developed a rule-based algorithm that divides the normalization task into two steps. The first step includes pattern matching for gene symbols and an approximate term searching technique for gene names. Next, the algorithm measures several features based on morphological, statistical, and contextual information to estimate the level of confidence that the correct identifier is selected for a potential mention. Uniqueness, inverse distance, and coverage are three novel features we quantified. The algorithm was evaluated against the BioCreAtIvE datasets. The feature weights were tuned by the Nealder-Mead simplex method. An F-score of .7622 and an AUC (area under the recall-precision curve) of .7461 were achieved on the test data using the set of weights optimized to the training data.


Subject(s)
Abstracting and Indexing/methods , Artificial Intelligence , Database Management Systems , Genes , Information Storage and Retrieval/methods , Natural Language Processing , PubMed , Computer Graphics , Confidence Intervals , Data Interpretation, Statistical , Documentation/methods , Internet , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...