Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Open Res Eur ; 4: 160, 2024.
Article in English | MEDLINE | ID: mdl-39185338

ABSTRACT

Objective: The European Health Data Space (EHDS) shapes the digital transformation of healthcare in Europe. The EHDS regulation will also accelerate the use of health data for research, innovation, policy-making, and regulatory activities for secondary use of data (known as EHDS2). The Integration of heterogeneous Data and Evidence towards Regulatory and HTA Acceptance (IDERHA) project builds one of the first pan-European health data spaces in alignment with the EHDS2 requirements, addressing lung cancer as a pilot. Methods: In this study, we conducted a comprehensive review of the EHDS regulation, technical requirements for EHDS2, and related projects. We also explored the results of the Joint Action Towards the European Health Data Space (TEHDAS) to identify the framework of IDERHA's alignment with EHDS2. We also conducted an internal webinar and an external workshop with EHDS experts to share expertise on the EHDS requirements and challenges. Results: We identified the lessons learned from the existing projects and the minimum-set of requirements for aligning IDERHA infrastructure with EHDS2, including user journey, concepts, terminologies, and standards. The IDERHA framework (i.e., platform architecture, standardization approaches, documentation, etc.) is being developed accordingly. Discussion: The IDERHA's alignment plan with EHDS2 necessitates the implementation of three categories of standardization for: data discoverability: Data Catalog Vocabulary (DCAT-AP), enabling semantics interoperability: Observational Medical Outcomes Partnership (OMOP), and health data exchange (DICOM and FHIR). The main challenge is that some standards are still being refined, e.g., the extension of the DCAT-AP (HealthDCAT-AP). Additionally, extensions to the Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model (CDM) to represent the patient-generated health data are still needed. Finally, proper mapping between standards (FHIR/OMOP) is a prerequisite for proper data exchange. Conclusions: The IDERHA's plan and our collaboration with other EHDS initiatives/projects are critical in advancing the implementation of EHDS2.

2.
BMC Cancer ; 24(1): 912, 2024 Jul 29.
Article in English | MEDLINE | ID: mdl-39075397

ABSTRACT

In oncology anti-PD1 / PDL1 therapy development for solid tumors, objective response rate (ORR) is commonly used clinical endpoint for early phase study decision making, while progression free survival (PFS) and overall survival (OS) are widely used for late phase study decision making. Developing predictive models to late phase outcomes such as median PFS (mPFS) and median OS (mOS) based on early phase clinical outcome ORR could inform late phase study design optimization and probability of success (POS) evaluation. In existing literature, there are ORR / mPFS / mOS association and surrogacy investigations with limited number of included clinical trials. In this paper, without establishing surrogacy, we attempt to predict late phase survival (mPFS and mOS) based on early efficacy ORR and optimize late phase trial design for anti-PD1 / PDL1 therapy development. In order to include adequate number of eligible clinical trials, we built a comprehensive quantitative clinical trial landscape database (QLD) by combining information from different sources such as clinicaltrial.gov, publications, company press releases for relevant indications and therapies. We developed a generalizable algorithm to systematically extract structured data for scientific accuracy and completeness. Finally, more than 150 late phase clinical trials were identified for ORR / mPFS (ORR / mOS) predictive model development while existing literature included at most 50 trials. A tree-based machine learning regression model has been derived to account for ORR / mPFS (ORR / mOS) relationship heterogeneity across tumor type, stage, line of therapy, treatment class and borrow strength simultaneously when homogeneity persists. The proposed method ensures that the predictive model is robust and have explicit structure for clinical interpretation. Through cross validation, the average predictive mean square error of the proposed model is competitive to random forest and extreme gradient boosting methods and outperforms commonly used additive or interaction linear regression models. An example application of the proposed ORR / mPFS (ORR / mOS) predictive model on late phase trial POS evaluation for anti-PD1 / PDL1 combination therapy was illustrated.


Subject(s)
B7-H1 Antigen , Neoplasms , Programmed Cell Death 1 Receptor , Progression-Free Survival , Humans , B7-H1 Antigen/antagonists & inhibitors , Neoplasms/drug therapy , Neoplasms/mortality , Programmed Cell Death 1 Receptor/antagonists & inhibitors , Immune Checkpoint Inhibitors/therapeutic use , Immune Checkpoint Inhibitors/pharmacology , Clinical Trials as Topic
3.
J Comp Eff Res ; 13(7): e230164, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38869838

ABSTRACT

Background: Eligibility criteria are pivotal in achieving clinical trial success, enabling targeted patient enrollment while ensuring the trial safety. However, overly restrictive criteria hinder enrollment and study result generalizability. Broadening eligibility criteria enhances the trial inclusivity, diversity and enrollment pace. Liu et al. proposed an AI pathfinder method leveraging real-world data to broaden criteria without compromising efficacy and safety outcomes, demonstrating promise in non-small cell lung cancer trials. Aim: To assess the robustness of the methodology, considering diverse qualities of real-world data and to promote its application. Materials/Methods: We revised the AI pathfinder method, applied it to relapsed and refractory multiple myeloma trials and compared it using two real-world data sources. We modified the assessment and considered a bootstrap confidence interval of the AI pathfinder to enhance the decision robustness. Results & conclusion: Our findings confirmed the AI pathfinder's potential in identifying certain eligibility criteria, in other words, prior complications and laboratory tests for relaxation or removal. However, a robust quantitative assessment, accounting for trial variability and real-world data quality, is crucial for confident decision-making and prioritizing safety alongside efficacy.


Subject(s)
Multiple Myeloma , Patient Selection , Humans , Multiple Myeloma/therapy , Multiple Myeloma/drug therapy , Artificial Intelligence , Clinical Trials as Topic/methods , Eligibility Determination/methods
4.
Clin Pharmacol Ther ; 114(4): 751-767, 2023 10.
Article in English | MEDLINE | ID: mdl-37393555

ABSTRACT

Since the 21st Century Cures Act was signed into law in 2016, real-world data (RWD) and real-world evidence (RWE) have attracted great interest from the healthcare ecosystem globally. The potential and capability of RWD/RWE to inform regulatory decisions and clinical drug development have been extensively reviewed and discussed in the literature. However, a comprehensive review of current applications of RWD/RWE in clinical pharmacology, particularly from an industry perspective, is needed to inspire new insights and identify potential future opportunities for clinical pharmacologists to utilize RWD/RWE to address key drug development questions. In this paper, we review the RWD/RWE applications relevant to clinical pharmacology based on recent publications from member companies in the International Consortium for Innovation and Quality in Pharmaceutical Development (IQ) RWD Working Group, and discuss the future direction of RWE utilization from a clinical pharmacology perspective. A comprehensive review of RWD/RWE use cases is provided and discussed in the following categories of application: drug-drug interaction assessments, dose recommendation for patients with organ impairment, pediatric plan development and study design, model-informed drug development (e.g., disease progression modeling), prognostic and predictive biomarkers/factors identification, regulatory decisions support (e.g., label expansion), and synthetic/external control generation for rare diseases. Additionally, we describe and discuss common sources of RWD to help guide appropriate data selection to address questions pertaining to clinical pharmacology in drug development and regulatory decision making.


Subject(s)
Ecosystem , Pharmacology, Clinical , Humans , Child , Drug Development , Delivery of Health Care
5.
Epidemiology ; 34(5): 627-636, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37255252

ABSTRACT

It has been well established that randomized clinical trials have poor external validity, resulting in findings that may not apply to relevant-or target-populations. When the trial is sampled from the target population, generalizability methods have been proposed to address the applicability of trial findings to target populations. When the trial sample and target populations are distinct, transportability methods may be applied for this purpose. However, generalizability and transportability studies present challenges, particularly around the strength of their conclusions. We review and summarize state-of-the-art methods for translating trial findings to target populations. We additionally provide a novel step-by-step guide to address these challenges, illustrating principles through a published case study. When conducted with rigor, generalizability and transportability studies can play an integral role in regulatory decisions by providing key real-world evidence.


Subject(s)
Research Design , Humans , Causality
6.
PLoS One ; 17(12): e0278842, 2022.
Article in English | MEDLINE | ID: mdl-36520950

ABSTRACT

Inverse odds of participation weighting (IOPW) has been proposed to transport clinical trial findings to target populations of interest when the distribution of treatment effect modifiers differs between trial and target populations. We set out to apply IOPW to transport results from an observational study to a target population of interest. We demonstrated the feasibility of this idea with a real-world example using a nationwide electronic health record derived de-identified database from Flatiron Health. First, we conducted an observational study that carefully adjusted for confounding to estimate the treatment effect of fulvestrant plus palbociclib relative to letrozole plus palbociclib as a second-line therapy among estrogen receptor (ER)-positive, human epidermal growth factor receptor (HER2)-negative metastatic breast cancer patients. Second, we transported these findings to the broader cohort of patients who were eligible for a first-line therapy. The interpretation of the findings and validity of such studies, however, rely on the extent that causal inference assumptions are met.


Subject(s)
Breast Neoplasms , Receptor, ErbB-2 , Humans , Female , Receptor, ErbB-2/metabolism , Letrozole/therapeutic use , Receptors, Estrogen/metabolism , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Piperazines/therapeutic use , Pyridines/therapeutic use , Breast Neoplasms/pathology
8.
Trials ; 22(1): 537, 2021 Aug 16.
Article in English | MEDLINE | ID: mdl-34399832

ABSTRACT

BACKGROUND: Interest in the application of machine learning (ML) to the design, conduct, and analysis of clinical trials has grown, but the evidence base for such applications has not been surveyed. This manuscript reviews the proceedings of a multi-stakeholder conference to discuss the current and future state of ML for clinical research. Key areas of clinical trial methodology in which ML holds particular promise and priority areas for further investigation are presented alongside a narrative review of evidence supporting the use of ML across the clinical trial spectrum. RESULTS: Conference attendees included stakeholders, such as biomedical and ML researchers, representatives from the US Food and Drug Administration (FDA), artificial intelligence technology and data analytics companies, non-profit organizations, patient advocacy groups, and pharmaceutical companies. ML contributions to clinical research were highlighted in the pre-trial phase, cohort selection and participant management, and data collection and analysis. A particular focus was paid to the operational and philosophical barriers to ML in clinical research. Peer-reviewed evidence was noted to be lacking in several areas. CONCLUSIONS: ML holds great promise for improving the efficiency and quality of clinical research, but substantial barriers remain, the surmounting of which will require addressing significant gaps in evidence.


Subject(s)
Artificial Intelligence , Machine Learning , Humans , United States , United States Food and Drug Administration
9.
Diabetes Ther ; 11(6): 1293-1302, 2020 Jun.
Article in English | MEDLINE | ID: mdl-32304086

ABSTRACT

INTRODUCTION: We examined differences in hypoglycaemia risk between insulin glargine 300 U/mL (Gla-300) and insulin glargine 100 U/mL (Gla-100) in individuals with type 2 diabetes (T2DM) using the low blood glucose index (LBGI). METHODS: Daily profiles of self-monitored plasma glucose (SMPG) from the EDITION 2, EDITION 3 and SENIOR treat-to-target trials of Gla-300 versus Gla-100 were used to compute the LBGI, which is an established metric of hypoglycaemia risk. The analysis also examined documented (blood glucose readings < 3.0 mmol/L [54 mg/dL]) symptomatic hypoglycaemia (DSH). RESULTS: Overall LBGI in EDITION 2 and SENIOR and night-time LBGI in all three trials were significantly (p < 0.05) lower with Gla-300 versus Gla-100. The largest differences between Gla-300 and Gla-100 were observed during the night. In all three trials, individual LBGI results correlated with the observed number of DSH episodes per participant (EDITION 2 [r = 0.35, p < 0.001]; EDITION 3 [r = 0.26, p < 0.001]; SENIOR [r = 0.30, p < 0.001]). Participants at moderate risk of experiencing hypoglycaemia (defined as LBGI > 1.1) reported 4- to 8-fold more frequent DSH events than those at minimal risk (LBGI ≤ 1.1) (p ≤ 0.009). CONCLUSIONS: The LBGI identified individuals with T2DM at risk for hypoglycaemia using SMPG data and correlated with the number of DSH events. Using the LBGI metric, a lower risk of hypoglycaemia with Gla-300 than Gla-100 was observed in all three trials. The finding that differences in LBGI are greater at night is consistent with previously published differences in the pharmacokinetic profiles of Gla-300 and Gla-100, which provides the physiological foundation for the presented results.

10.
N Engl J Med ; 381(11): e22, 2019 09 12.
Article in English | MEDLINE | ID: mdl-31509686
11.
Stat Med ; 31(17): 1791-803, 2012 Jul 30.
Article in English | MEDLINE | ID: mdl-22715129

ABSTRACT

Different analytic approaches for modeling baseline data in crossover trials were compared based on the efficiency in estimating treatment effects. Jointly modeling baseline and post-baseline data is recommended to best utilize baseline data. It results in the most significant gain in efficiency when data are strongly correlated within the same period but weakly correlated between different periods. Its performance remains comparable to the best of various other modeling methods under small within period correlation or large between period correlation. We also examined the use of baseline data in modeling carryover effect. We noted that to model carryover effect in crossover trial generally would lead to a much less efficient estimator and much more sensitive inference.


Subject(s)
Clinical Trials as Topic/methods , Cross-Over Studies , Data Interpretation, Statistical , Models, Statistical , Computer Simulation , Electrocardiography/drug effects , Humans
12.
J Biopharm Stat ; 22(3): 438-62, 2012.
Article in English | MEDLINE | ID: mdl-22416834

ABSTRACT

The ICH E14 guidance recommends the use of a time-matched baseline, while others recommend alternative baseline definitions including a day-averaged baseline. In this article we consider six models adjusting for baselines. We derive the explicit covariances and compare their power under various conditions. Simulation results are provided. We conclude that type I error rates are controlled. However, one model outperforms the others on statistical power under certain conditions. In general, the analysis of covariance (ANCOVA) model using a day-averaged baseline is preferred. If the time-matched baseline has to be used as per requests from regulatory agencies, the analysis by time point using ANCOVA model should be recommended.


Subject(s)
Models, Statistical , Randomized Controlled Trials as Topic/statistics & numerical data , Analysis of Variance , Guidelines as Topic , Humans
13.
J Chromatogr B Analyt Technol Biomed Life Sci ; 879(21): 1899-904, 2011 Jul 01.
Article in English | MEDLINE | ID: mdl-21621488

ABSTRACT

A method for extraction and preparative separation of tanshinones from Salvia miltiorrhiza Bunge was successfully established in this paper. Tanshinones from Salvia miltiorrhiza Bunge were extracted using ethyl acetate as the extractant under reflux. The extracts were then purified by high speed counter-current chromatography (HSCCC) with light petroleum-ethyl acetate-methanol-water (6:4:6.5:3.5, v/v) as the two phase solvent system. The upper phase was used as the stationary phase and the lower phase as the mobile phase. 8.2mg of dihydrotanshinone I, 5.8 mg of 1,2,15,16-tetrahydrotanshiquinone, 26.3mg of cryptotanshinone, 16.2mg of tanshinone I, 25.6 mg of neo-przewaquinone A, 68.8 mg of tanshinone IIA and 9.3mg of miltirone were obtained from 400mg of extracts from Salvia miltiorrhiza Bunge in one-step HSCCC separation, with the purity of 97. 6%, 95.1%, 99.0%, 99.1%, 93.2%, 99.3% and 98.7%, respectively, as determined by HPLC area normalization method. Their chemical structures were identified by ¹H NMR.


Subject(s)
Abietanes/isolation & purification , Countercurrent Distribution/methods , Drugs, Chinese Herbal/chemistry , Salvia miltiorrhiza/chemistry , Abietanes/chemistry , Nuclear Magnetic Resonance, Biomolecular
14.
J Biopharm Stat ; 20(3): 563-77, 2010 May.
Article in English | MEDLINE | ID: mdl-20358436

ABSTRACT

The sample size requirement in a thorough QT/QTc study is discussed under a balanced parallel or crossover study design. First, we explore the impacts of various factors on the study power, including the mean effect profile across time and correlation among time points. Then we estimate the variability parameters needed based on multiple historical studies. Different baseline usage is illustrated to have a significant impact on the analysis variability in the parallel studies. Finally, the sample size calculations and recommendations are given for demonstrating a "negative" drug effect and the study assay sensitivity, respectively.


Subject(s)
Arrhythmias, Cardiac/chemically induced , Clinical Trials as Topic/statistics & numerical data , Heart Rate/drug effects , Models, Statistical , Sample Size , Arrhythmias, Cardiac/diagnosis , Arrhythmias, Cardiac/physiopathology , Circadian Rhythm , Cross-Over Studies , Data Interpretation, Statistical , Electrocardiography/statistics & numerical data , Humans , Time Factors
15.
J Biopharm Stat ; 20(3): 665-82, 2010 May.
Article in English | MEDLINE | ID: mdl-20358444

ABSTRACT

Using historical studies, we compared the impact of using the average baseline or time-matched baseline on diurnal effect correction, treatment effect estimation, and analysis of variance/covariance (ANOVA/ANCOVA) efficiency in a parallel thorough QT/QTc (TQT) study. Under a multivariate normal distribution assumption, we derived conditions for achieving unbiasness and better efficiency when using the average baseline, and confirmed these conditions using historical TQT studies. Furthermore, simulations were conducted under the randomized trial with and without observed imbalanced baseline settings. We conclude that the analyses using average baseline yield better efficiency and unbiased or less biased results under our TQT study conditions.


Subject(s)
Arrhythmias, Cardiac/chemically induced , Heart Rate/drug effects , Models, Statistical , Randomized Controlled Trials as Topic/statistics & numerical data , Analysis of Variance , Arrhythmias, Cardiac/diagnosis , Arrhythmias, Cardiac/physiopathology , Bias , Circadian Rhythm , Computer Simulation , Cross-Over Studies , Data Interpretation, Statistical , Electrocardiography/statistics & numerical data , Humans , Time Factors
16.
J Chromatogr A ; 1140(1-2): 219-24, 2007 Jan 26.
Article in English | MEDLINE | ID: mdl-17174318

ABSTRACT

A method for isolation and purification of flavonoid and isoflavonoid compounds in extracts of the pericarp of Sophora japonica L. was established by adsorption chromatography on the 12% cross-linked agarose gel Superose 12. The crude extracts were pre-separated to two parts, sample A and sample B, on a D-101 macroporous resin column by elution with 20% ethanol and 40% ethanol, respectively. Samples A and B were then separated by adsorption chromatography on Superose 12 with 40% methanol as the mobile phase. Eight compounds including four kinds of flavonoids and four kinds of isoflavonoids were obtained by the proposed method. The adsorption mechanisms of flavonoids and isoflavonoids on Superose 12 were also discussed.


Subject(s)
Chromatography, Gel/methods , Flavonoids/isolation & purification , Sophora/chemistry , Adsorption , Chromatography, High Pressure Liquid
17.
Bioinformatics ; 22(21): 2635-42, 2006 Nov 01.
Article in English | MEDLINE | ID: mdl-16926220

ABSTRACT

MOTIVATION: The nearest shrunken centroids classifier has become a popular algorithm in tumor classification problems using gene expression microarray data. Feature selection is an embedded part of the method to select top-ranking genes based on a univariate distance statistic calculated for each gene individually. The univariate statistics summarize gene expression profiles outside of the gene co-regulation network context, leading to redundant information being included in the selection procedure. RESULTS: We propose an Eigengene-based Linear Discriminant Analysis (ELDA) to address gene selection in a multivariate framework. The algorithm uses a modified rotated Spectral Decomposition (SpD) technique to select 'hub' genes that associate with the most important eigenvectors. Using three benchmark cancer microarray datasets, we show that ELDA selects the most characteristic genes, leading to substantially smaller classifiers than the univariate feature selection based analogues. The resulting de-correlated expression profiles make the gene-wise independence assumption more realistic and applicable for the shrunken centroids classifier and other diagonal linear discriminant type of models. Our algorithm further incorporates a misclassification cost matrix, allowing differential penalization of one type of error over another. In the breast cancer data, we show false negative prognosis can be controlled via a cost-adjusted discriminant function. AVAILABILITY: R code for the ELDA algorithm is available from author upon request.


Subject(s)
Artificial Intelligence , Biomarkers, Tumor/analysis , Diagnosis, Computer-Assisted/methods , Gene Expression Profiling/methods , Neoplasm Proteins/analysis , Neoplasms/diagnosis , Neoplasms/metabolism , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Computer Simulation , Discriminant Analysis , Humans , Linear Models , Models, Genetic , Multivariate Analysis , Reproducibility of Results , Sensitivity and Specificity
18.
Am J Hum Genet ; 78(5): 737-746, 2006 May.
Article in English | MEDLINE | ID: mdl-16642430

ABSTRACT

Identification and description of genetic variation underlying disease susceptibility, efficacy, and adverse reactions to drugs remains a difficult problem. One of the important steps in the analysis of variation in a candidate region is the characterization of linkage disequilibrium (LD). In a region of genetic association, the extent of LD varies between the case and the control groups. Separate plots of pairwise standardized measures of LD (e.g., D') for cases and controls are often presented for a candidate region, to graphically convey case-control differences in LD. However, the observed graphic differences lack statistical support. Therefore, we suggest the "LD contrast" test to compare whole matrices of disequilibrium between two samples. A common technique of assessing LD when the haplotype phase is unobserved is the expectation-maximization algorithm, with the likelihood incorporating the assumption of Hardy-Weinberg equilibrium (HWE). This approach presents a potential problem in that, in the region of genetic association, the HWE assumption may not hold when samples are selected on the basis of phenotypes. Here, we present a computationally feasible approach that does not assume HWE, along with graphic displays and a statistical comparison of pairwise matrices of LD between case and control samples. LD-contrast tests provide a useful addition to existing tools of finding and characterizing genetic associations. Although haplotype association tests are expected to provide superior power when susceptibilities are primarily determined by haplotypes, the LD-contrast tests demonstrate substantially higher power under certain haplotype-driven disease models.


Subject(s)
Case-Control Studies , Chromosome Mapping/methods , Cytochrome P-450 CYP2D6/genetics , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Computational Biology/methods , Computational Biology/statistics & numerical data , Computer Simulation , Cytochrome P-450 CYP2D6/pharmacology , Genetic Markers , Genetic Predisposition to Disease , Genetic Variation , Haplotypes , Humans , Models, Statistical , Quantitative Trait, Heritable
19.
BMC Genet ; 5: 9, 2004 May 11.
Article in English | MEDLINE | ID: mdl-15137913

ABSTRACT

BACKGROUND: This article describes classical and Bayesian interval estimation of genetic susceptibility based on random samples with pre-specified numbers of unrelated cases and controls. RESULTS: Frequencies of genotypes in cases and controls can be estimated directly from retrospective case-control data. On the other hand, genetic susceptibility defined as the expected proportion of cases among individuals with a particular genotype depends on the population proportion of cases (prevalence). Given this design, prevalence is an external parameter and hence the susceptibility cannot be estimated based on only the observed data. Interval estimation of susceptibility that can incorporate uncertainty in prevalence values is explored from both classical and Bayesian perspective. Similarity between classical and Bayesian interval estimates in terms of frequentist coverage probabilities for this problem allows an appealing interpretation of classical intervals as bounds for genetic susceptibility. In addition, it is observed that both the asymptotic classical and Bayesian interval estimates have comparable average length. These interval estimates serve as a very good approximation to the "exact" (finite sample) Bayesian interval estimates. Extension from genotypic to allelic susceptibility intervals shows dependency on phenotype-induced deviations from Hardy-Weinberg equilibrium. CONCLUSIONS: The suggested classical and Bayesian interval estimates appear to perform reasonably well. Generally, the use of exact Bayesian interval estimation method is recommended for genetic susceptibility, however the asymptotic classical and approximate Bayesian methods are adequate for sample sizes of at least 50 cases and controls.


Subject(s)
Genetic Predisposition to Disease/epidemiology , Genetic Predisposition to Disease/genetics , Molecular Epidemiology , Alleles , Bayes Theorem , Case-Control Studies , Haplotypes/genetics , Humans , Models, Statistical , Molecular Epidemiology/methods , Molecular Epidemiology/statistics & numerical data , Pharmacogenetics/methods , Pharmacogenetics/statistics & numerical data , Prevalence , Retrospective Studies , Sample Size
20.
Am J Hum Genet ; 73(1): 115-30, 2003 Jul.
Article in English | MEDLINE | ID: mdl-12796855

ABSTRACT

The genotyping of closely spaced single-nucleotide polymorphism (SNP) markers frequently yields highly correlated data, owing to extensive linkage disequilibrium (LD) between markers. The extent of LD varies widely across the genome and drives the number of frequent haplotypes observed in small regions. Several studies have illustrated the possibility that LD or haplotype data could be used to select a subset of SNPs that optimize the information retained in a genomic region while reducing the genotyping effort and simplifying the analysis. We propose a method based on the spectral decomposition of the matrices of pairwise LD between markers, and we select markers on the basis of their contributions to the total genetic variation. We also modify Clayton's "haplotype tagging SNP" selection method, which utilizes haplotype information. For both methods, we propose sliding window-based algorithms that allow the methods to be applied to large chromosomal regions. Our procedures require genotype information about a small number of individuals for an initial set of SNPs and selection of an optimum subset of SNPs that could be efficiently genotyped on larger numbers of samples while retaining most of the genetic variation in samples. We identify suitable parameter combinations for the procedures, and we show that a sample size of 50-100 individuals achieves consistent results in studies of simulated data sets in linkage equilibrium and LD. When applied to experimental data sets, both procedures were similarly effective at reducing the genotyping requirement while maintaining the genetic information content throughout the regions. We also show that haplotype-association results that Hosking et al. obtained near CYP2D6 were almost identical before and after marker selection.


Subject(s)
Genetic Markers , Haplotypes , Linkage Disequilibrium , Alleles , Humans , Polymorphism, Single Nucleotide
SELECTION OF CITATIONS
SEARCH DETAIL