Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 41
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Nature ; 615(7951): 285-291, 2023 03.
Article in English | MEDLINE | ID: mdl-36859541

ABSTRACT

The germline mutation rate determines the pace of genome evolution and is an evolving parameter itself1. However, little is known about what determines its evolution, as most studies of mutation rates have focused on single species with different methodologies2. Here we quantify germline mutation rates across vertebrates by sequencing and comparing the high-coverage genomes of 151 parent-offspring trios from 68 species of mammals, fishes, birds and reptiles. We show that the per-generation mutation rate varies among species by a factor of 40, with mutation rates being higher for males than for females in mammals and birds, but not in reptiles and fishes. The generation time, age at maturity and species-level fecundity are the key life-history traits affecting this variation among species. Furthermore, species with higher long-term effective population sizes tend to have lower mutation rates per generation, providing support for the drift barrier hypothesis3. The exceptionally high yearly mutation rates of domesticated animals, which have been continually selected on fecundity traits including shorter generation times, further support the importance of generation time in the evolution of mutation rates. Overall, our comparative analysis of pedigree-based mutation rates provides ecological insights on the mutation rate evolution in vertebrates.


Subject(s)
Evolution, Molecular , Germ-Line Mutation , Mutation Rate , Vertebrates , Animals , Female , Male , Birds/genetics , Fishes/genetics , Germ-Line Mutation/genetics , Mammals/genetics , Reptiles/genetics , Vertebrates/genetics
2.
Genome Res ; 27(9): 1597-1607, 2017 09.
Article in English | MEDLINE | ID: mdl-28774965

ABSTRACT

Genes in the major histocompatibility complex (MHC, also known as HLA) play a critical role in the immune response and variation within the extended 4-Mb region shows association with major risks of many diseases. Yet, deciphering the underlying causes of these associations is difficult because the MHC is the most polymorphic region of the genome with a complex linkage disequilibrium structure. Here, we reconstruct full MHC haplotypes from de novo assembled trios without relying on a reference genome and perform evolutionary analyses. We report 100 full MHC haplotypes and call a large set of structural variants in the regions for future use in imputation with GWAS data. We also present the first complete analysis of the recombination landscape in the entire region and show how balancing selection at classical genes have linked effects on the frequency of variants throughout the region.


Subject(s)
Genetic Variation/genetics , Genetics, Population , Linkage Disequilibrium/genetics , Major Histocompatibility Complex/genetics , Alleles , Chromosome Mapping , Denmark , Haplotypes/genetics , Humans , Polymorphism, Single Nucleotide/genetics
3.
Blood ; 131(7): 759-770, 2018 02 15.
Article in English | MEDLINE | ID: mdl-29208599

ABSTRACT

Mycosis fungoides (MF) is the most frequent form of cutaneous T-cell lymphoma. The disease often takes an indolent course, but in approximately one-third of the patients, the disease progresses to an aggressive malignancy with a poor prognosis. At the time of diagnosis, it is impossible to predict which patients develop severe disease and are in need of aggressive treatment. Accordingly, we investigated the prognostic potential of microRNAs (miRNAs) at the time of diagnosis in MF. Using a quantitative reverse transcription polymerase chain reaction platform, we analyzed miRNA expression in diagnostic skin biopsies from 154 Danish patients with early-stage MF. The patients were subdivided into a discovery cohort (n = 82) and an independent validation cohort (n = 72). The miRNA classifier was built using a LASSO (least absolute shrinkage and selection operator) Cox regression to predict progression-free survival (PFS). We developed a 3-miRNA classifier, based on miR-106b-5p, miR-148a-3p, and miR-338-3p, which successfully separated patients into high-risk and low-risk groups of disease progression. PFS was significantly different between these groups in both the discovery cohort and the validation cohort. The classifier was stronger than existing clinical prognostic factors and remained a strong independent prognostic tool after stratification and adjustment for these factors. Importantly, patients in the high-risk group had a significantly reduced overall survival. The 3-miRNA classifier is an effective tool to predict disease progression of early-stage MF at the time of diagnosis. The classifier adds significant prognostic value to existing clinical prognostic factors and may facilitate more individualized treatment of these patients.


Subject(s)
MicroRNAs/genetics , Mycosis Fungoides/diagnosis , Mycosis Fungoides/genetics , Skin Neoplasms/diagnosis , Skin Neoplasms/genetics , Transcriptome , Biomarkers, Tumor/genetics , Denmark/epidemiology , Disease Progression , Gene Expression Regulation, Neoplastic , Humans , Mycosis Fungoides/pathology , Mycosis Fungoides/therapy , Neoplasm Staging , Prognosis , Progression-Free Survival , Skin Neoplasms/pathology , Skin Neoplasms/therapy
4.
Int J Cancer ; 145(12): 3445-3452, 2019 12 15.
Article in English | MEDLINE | ID: mdl-31125115

ABSTRACT

Improved prognostic biomarkers are needed to guide personalized prostate cancer (PC) treatment decisions. Due to the prominent molecular heterogeneity of PC, multimarker panels may be more robust. Here, 25 selected top-candidate miRNA and methylation markers for PC were profiled by qPCR in malignant radical prostatectomy (RP) tissue specimens from 198 PC patients (Cohort 1, training). Using GLMnet, we trained a novel multimarker model (miMe) comprising nine miRNAs and three methylation markers that predicted postoperative biochemical recurrence (BCR) independently of the established clinicopathological CAPRA-S nomogram in Cox multivariate regression analysis in Cohort 1 (HR [95% CI]: 1.53 [1.26-1.84], p < 0.001). This result was successfully validated in two independent RP cohorts (Cohort 2, n = 159: HR [95% CI]: 1.35 [1.06-1.73], p = 0.015. TCGA, n = 350: HR [95% CI]: 1.34 [1.01-1.77], p = 0.04). Notably, in CAPRA-S low-risk patients, a high miMe score was associated with >6 times higher risk of BCR, suggesting that miMe may help identify PC patients at high risk of progression despite favorable clinicopathological factors postsurgery. Finally, miMe was a significant predictor of cancer-specific survival (p = 0.019, log-rank test) in a merged analysis of 357 RP patients. In conclusion, we trained, tested and validated a novel 12-marker panel (miMe) that showed significant independent prognostic value in three RP cohorts. In the future, combining miMe score with existing clinical nomograms may improve PC risk stratification and thus help guide treatment decisions.


Subject(s)
Biomarkers, Tumor/genetics , MicroRNAs/genetics , Prostatic Neoplasms/genetics , Adult , Aged , Cohort Studies , Disease Progression , Humans , Kaplan-Meier Estimate , Male , Methylation , Middle Aged , Neoplasm Recurrence, Local/genetics , Neoplasm Recurrence, Local/pathology , Nomograms , Prognosis , Prostate/pathology , Prostate-Specific Antigen/genetics , Prostatectomy/methods , Prostatic Neoplasms/pathology , Risk Factors
5.
PLoS Genet ; 12(11): e1006315, 2016 Nov.
Article in English | MEDLINE | ID: mdl-27846220

ABSTRACT

Mutation of the DNA molecule is one of the most fundamental processes in biology. In this study, we use 283 parent-offspring trios to estimate the rate of mutation for both single nucleotide variants (SNVs) and short length variants (indels) in humans and examine the mutation process. We found 17812 SNVs, corresponding to a mutation rate of 1.29 × 10-8 per position per generation (PPPG) and 1282 indels corresponding to a rate of 9.29 × 10-10 PPPG. We estimate that around 3% of human de novo SNVs are part of a multi-nucleotide mutation (MNM), with 558 (3.1%) of mutations positioned less than 20kb from another mutation in the same individual (median distance of 525bp). The rate of de novo mutations is greater in late replicating regions (p = 8.29 × 10-19) and nearer recombination events (p = 0.0038) than elsewhere in the genome.


Subject(s)
Genome, Human , INDEL Mutation/genetics , Mutation Rate , DNA Mutational Analysis , High-Throughput Nucleotide Sequencing , Humans , Polymorphism, Single Nucleotide/genetics , Recombination, Genetic/genetics
6.
BMC Bioinformatics ; 19(1): 147, 2018 04 19.
Article in English | MEDLINE | ID: mdl-29673314

ABSTRACT

BACKGROUND: Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. RESULTS: To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. CONCLUSION: We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.


Subject(s)
Genome, Human , Models, Genetic , Mutation Rate , Mutation/genetics , Neoplasms/genetics , Databases, Genetic , Epigenomics , Humans , Polymorphism, Single Nucleotide/genetics , Regression Analysis
7.
Nature ; 488(7412): 471-5, 2012 Aug 23.
Article in English | MEDLINE | ID: mdl-22914163

ABSTRACT

Mutations generate sequence diversity and provide a substrate for selection. The rate of de novo mutations is therefore of major importance to evolution. Here we conduct a study of genome-wide mutation rates by sequencing the entire genomes of 78 Icelandic parent-offspring trios at high coverage. We show that in our samples, with an average father's age of 29.7, the average de novo mutation rate is 1.20 × 10(-8) per nucleotide per generation. Most notably, the diversity in mutation rate of single nucleotide polymorphisms is dominated by the age of the father at conception of the child. The effect is an increase of about two mutations per year. An exponential model estimates paternal mutations doubling every 16.5 years. After accounting for random Poisson variation, father's age is estimated to explain nearly all of the remaining variation in the de novo mutation counts. These observations shed light on the importance of the father's age on the risk of diseases such as schizophrenia and autism.


Subject(s)
Autistic Disorder/genetics , Genetic Predisposition to Disease , Mutation Rate , Paternal Age , Schizophrenia/genetics , Adult , Autistic Disorder/epidemiology , Autistic Disorder/etiology , Chromosomes, Human/genetics , Female , Genome, Human/genetics , Humans , Iceland/epidemiology , Male , Middle Aged , Mothers , Ovum/metabolism , Pedigree , Polymorphism, Single Nucleotide/genetics , Risk Factors , Schizophrenia/epidemiology , Schizophrenia/etiology , Selection, Genetic/genetics , Sequence Analysis, DNA , Spermatozoa/metabolism , Young Adult
8.
Nature ; 462(7275): 868-74, 2009 Dec 17.
Article in English | MEDLINE | ID: mdl-20016592

ABSTRACT

Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.


Subject(s)
Fathers , Genetic Predisposition to Disease/genetics , Mothers , Polymorphism, Single Nucleotide/genetics , Alleles , Binding Sites , Breast Neoplasms/genetics , CCCTC-Binding Factor , Carcinoma, Basal Cell/genetics , Chromosomes, Human, Pair 11/genetics , Chromosomes, Human, Pair 7/genetics , DNA Methylation/genetics , Diabetes Mellitus, Type 2/genetics , Female , Genome, Human/genetics , Genomic Imprinting/genetics , Haplotypes , Humans , Iceland , Male , Pedigree , Repressor Proteins/metabolism
9.
Hum Mol Genet ; 20(21): 4268-81, 2011 Nov 01.
Article in English | MEDLINE | ID: mdl-21750109

ABSTRACT

Three genome-wide association studies in Europe and the USA have reported eight urinary bladder cancer (UBC) susceptibility loci. Using extended case and control series and 1000 Genomes imputations of 5 340 737 single-nucleotide polymorphisms (SNPs), we searched for additional loci in the European GWAS. The discovery sample set consisted of 1631 cases and 3822 controls from the Netherlands and 603 cases and 37 781 controls from Iceland. For follow-up, we used 3790 cases and 7507 controls from 13 sample sets of European and Iranian ancestry. Based on the discovery analysis, we followed up signals in the urea transporter (UT) gene SLC14A. The strongest signal at this locus was represented by a SNP in intron 3, rs17674580, that reached genome-wide significance in the overall analysis of the discovery and follow-up groups: odds ratio = 1.17, P = 7.6 × 10(-11). SLC14A1 codes for UTs that define the Kidd blood group and are crucial for the maintenance of a constant urea concentration gradient in the renal medulla and, through this, the kidney's ability to concentrate urine. It is speculated that rs17674580, or other sequence variants in LD with it, indirectly modifies UBC risk by affecting urine production. If confirmed, this would support the 'urogenous contact hypothesis' that urine production and voiding frequency modify the risk of UBC.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Membrane Transport Proteins/genetics , Urinary Bladder Neoplasms/genetics , White People/genetics , Adult , Aged , Aged, 80 and over , Chromosomes, Human, Pair 18/genetics , Disease Progression , Female , Genetic Loci/genetics , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide/genetics , Reproducibility of Results , Risk Factors , Young Adult , Urea Transporters
10.
PLoS Genet ; 6(7): e1001029, 2010 Jul 22.
Article in English | MEDLINE | ID: mdl-20661439

ABSTRACT

We used an approach that we term ancestry-shift refinement mapping to investigate an association, originally discovered in a GWAS of a Chinese population, between rs2046210[T] and breast cancer susceptibility. The locus is on 6q25.1 in proximity to the C6orf97 and estrogen receptor alpha (ESR1) genes. We identified a panel of SNPs that are correlated with rs2046210 in Chinese, but not necessarily so in other ancestral populations, and genotyped them in breast cancer case:control samples of Asian, European, and African origin, a total of 10,176 cases and 13,286 controls. We found that rs2046210[T] does not confer substantial risk of breast cancer in Europeans and Africans (OR = 1.04, P = 0.099, and OR = 0.98, P = 0.77, respectively). Rather, in those ancestries, an association signal arises from a group of less common SNPs typified by rs9397435. The rs9397435[G] allele was found to confer risk of breast cancer in European (OR = 1.15, P = 1.2 x 10(-3)), African (OR = 1.35, P = 0.014), and Asian (OR = 1.23, P = 2.9 x 10(-4)) population samples. Combined over all ancestries, the OR was 1.19 (P = 3.9 x 10(-7)), was without significant heterogeneity between ancestries (P(het) = 0.36) and the SNP fully accounted for the association signal in each ancestry. Haplotypes bearing rs9397435[G] are well tagged by rs2046210[T] only in Asians. The rs9397435[G] allele showed associations with both estrogen receptor positive and estrogen receptor negative breast cancer. Using early-draft data from the 1,000 Genomes project, we found that the risk allele of a novel SNP (rs77275268), which is closely correlated with rs9397435, disrupts a partially methylated CpG sequence within a known CTCF binding site. These studies demonstrate that shifting the analysis among ancestral populations can provide valuable resolution in association mapping.


Subject(s)
Breast Neoplasms/genetics , Estrogen Receptor alpha/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/statistics & numerical data , Racial Groups/genetics , Breast Neoplasms/epidemiology , Chromosomes, Human, Pair 6 , Female , Genetic Loci , Genetic Predisposition to Disease/epidemiology , Humans , Polymorphism, Single Nucleotide
11.
Nat Commun ; 13(1): 7884, 2022 12 22.
Article in English | MEDLINE | ID: mdl-36550134

ABSTRACT

The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. As the first method of its kind, it does not only predict rates for point mutations but also insertions and deletions. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong selective constraint.


Subject(s)
Point Mutation , Software , Humans , Sequence Analysis, DNA/methods , Genome, Human/genetics , Mutation , Algorithms
12.
Elife ; 112022 07 27.
Article in English | MEDLINE | ID: mdl-35894300

ABSTRACT

Sequencing of cell-free DNA (cfDNA) is currently being used to detect cancer by searching both for mutational and non-mutational alterations. Recent work has shown that the length distribution of cfDNA fragments from a cancer patient can inform tumor load and type. Here, we propose non-negative matrix factorization (NMF) of fragment length distributions as a novel and completely unsupervised method for studying fragment length patterns in cfDNA. Using shallow whole-genome sequencing (sWGS) of cfDNA from a cohort of patients with metastatic castration-resistant prostate cancer (mCRPC), we demonstrate how NMF accurately infers the true tumor fragment length distribution as an NMF component - and that the sample weights of this component correlate with ctDNA levels (r=0.75). We further demonstrate how using several NMF components enables accurate cancer detection on data from various early stage cancers (AUC = 0.96). Finally, we show that NMF, when applied across genomic regions, can be used to discover fragment length signatures associated with open chromatin.


Subject(s)
Cell-Free Nucleic Acids , Circulating Tumor DNA , Biomarkers, Tumor/genetics , Circulating Tumor DNA/genetics , Genomics/methods , Humans , Male , Mutation
13.
Leukemia ; 36(1): 177-188, 2022 01.
Article in English | MEDLINE | ID: mdl-34244612

ABSTRACT

Mantle cell lymphoma (MCL) is characterized by marked differences in outcome, emphasizing the need for strong prognostic biomarkers. Here, we explore expression patterns and prognostic relevance of circular RNAs (circRNAs), a group of endogenous non-coding RNA molecules, in MCL. We profiled the circRNA expression landscape using RNA-sequencing and explored the prognostic potential of 40 abundant circRNAs in samples from the Nordic MCL2 and MCL3 clinical trials, using NanoString nCounter Technology. We report a circRNA-based signature (circSCORE) developed in the training cohort MCL2 that is highly predictive of time to progression (TTP) and lymphoma-specific survival (LSS). The dismal outcome observed in the large proportion of patients assigned to the circSCORE high-risk group was confirmed in the independent validation cohort MCL3, both in terms of TTP (HR 3.0; P = 0.0004) and LSS (HR 3.6; P = 0.001). In Cox multiple regression analysis incorporating MIPI, Ki67 index, blastoid morphology and presence of TP53 mutations, circSCORE retained prognostic significance for TTP (HR 3.2; P = 0.01) and LSS (HR 4.6; P = 0.01). In conclusion, circRNAs are promising prognostic biomarkers in MCL and circSCORE improves identification of high-risk disease among younger patients treated with cytarabine-containing chemoimmunotherapy and autologous stem cell transplant.


Subject(s)
Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Biomarkers, Tumor/genetics , Hematopoietic Stem Cell Transplantation/mortality , Lymphoma, Mantle-Cell/pathology , RNA, Circular/genetics , Case-Control Studies , Combined Modality Therapy , Female , Follow-Up Studies , Humans , Lymphoma, Mantle-Cell/genetics , Lymphoma, Mantle-Cell/therapy , Male , Middle Aged , Prognosis , RNA-Seq , Survival Rate , Transplantation, Autologous
14.
Elife ; 112022 01 12.
Article in English | MEDLINE | ID: mdl-35018888

ABSTRACT

In the past decade, several studies have estimated the human per-generation germline mutation rate using large pedigrees. More recently, estimates for various nonhuman species have been published. However, methodological differences among studies in detecting germline mutations and estimating mutation rates make direct comparisons difficult. Here, we describe the many different steps involved in estimating pedigree-based mutation rates, including sampling, sequencing, mapping, variant calling, filtering, and appropriately accounting for false-positive and false-negative rates. For each step, we review the different methods and parameter choices that have been used in the recent literature. Additionally, we present the results from a 'Mutationathon,' a competition organized among five research labs to compare germline mutation rate estimates for a single pedigree of rhesus macaques. We report almost a twofold variation in the final estimated rate among groups using different post-alignment processing, calling, and filtering criteria, and provide details into the sources of variation across studies. Though the difference among estimates is not statistically significant, this discrepancy emphasizes the need for standardized methods in mutation rate estimations and the difficulty in comparing rates from different studies. Finally, this work aims to provide guidelines for computational and statistical benchmarks for future studies interested in identifying germline mutations from pedigrees.


Subject(s)
Genetic Techniques , Germ-Line Mutation , Macaca mulatta/genetics , Mutation Rate , Animals , Genetic Techniques/instrumentation , Germ Cells , Laboratories , Pedigree , Reference Standards
15.
Gigascience ; 10(10)2021 10 21.
Article in English | MEDLINE | ID: mdl-34673928

ABSTRACT

The lack of consensus methods to estimate germline mutation rates from pedigrees has led to substantial differences in computational pipelines in the published literature. Here, we answer Susanne Pfeifer's opinion piece discussing the pipeline choices of our recent article estimating the germline mutation rate of rhesus macaques (Macaca mulatta). We acknowledge the differences between the method that we applied and the one preferred by Pfeifer. Yet, we advocate for full transparency and justification of choices as long as rigorous comparison of pipelines remains absent because it is the only way to conclude on best practices for the field.


Subject(s)
Germ-Line Mutation , Mutation Rate , Animals , Macaca mulatta/genetics
16.
Gigascience ; 10(5)2021 05 05.
Article in English | MEDLINE | ID: mdl-33954793

ABSTRACT

BACKGROUND: Understanding the rate and pattern of germline mutations is of fundamental importance for understanding evolutionary processes. RESULTS: Here we analyzed 19 parent-offspring trios of rhesus macaques (Macaca mulatta) at high sequencing coverage of ∼76× per individual and estimated a mean rate of 0.77 × 10-8de novo mutations per site per generation (95% CI: 0.69 × 10-8 to 0.85 × 10-8). By phasing 50% of the mutations to parental origins, we found that the mutation rate is positively correlated with the paternal age. The paternal lineage contributed a mean of 81% of the de novo mutations, with a trend of an increasing male contribution for older fathers. Approximately 3.5% of de novo mutations were shared between siblings, with no parental bias, suggesting that they arose from early development (postzygotic) stages. Finally, the divergence times between closely related primates calculated on the basis of the yearly mutation rate of rhesus macaque generally reconcile with divergence estimated with molecular clock methods, except for the Cercopithecoidea/Hominoidea molecular divergence dated at 58 Mya using our new estimate of the yearly mutation rate. CONCLUSIONS: When compared to the traditional molecular clock methods, new estimated rates from pedigree samples can provide insights into the evolution of well-studied groups such as primates.


Subject(s)
Germ-Line Mutation , Mutation Rate , Animals , Germ Cells , Macaca mulatta/genetics , Male , Phylogeny
17.
Genetics ; 181(2): 747-53, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19064712

ABSTRACT

We present a new method, termed QBlossoc, for linkage disequilibrium (LD) mapping of genetic variants underlying a quantitative trait. The method uses principles similar to a previously published method, Blossoc, for LD mapping of case/control studies. The method builds local genealogies along the genome and looks for a significant clustering of quantitative trait values in these trees. We analyze its efficiency in terms of localization and ranking of true positives among a large number of negatives and compare the results with single-marker approaches. Simulation results of markers at densities comparable to contemporary genotype chips show that QBlossoc is more accurate in localization of true positives as expected since it uses the additional information of LD between markers simultaneously. More importantly, however, for genomewide surveys, QBlossoc places regions with true positives higher on a ranked list than single-marker approaches, again suggesting that a true signal displays itself more strongly in a set of adjacent markers than a spurious (false) signal. The method is both memory and central processing unit (CPU) efficient. It has been tested on a real data set of height data for 5000 individuals measured at approximately 317,000 markers and completed analysis within 5 CPU days.


Subject(s)
Chromosome Mapping/methods , Phylogeny , Quantitative Trait Loci , Computer Simulation , Databases, Genetic , Genetic Markers , Genome-Wide Association Study/statistics & numerical data , Linkage Disequilibrium , Models, Genetic , Polymorphism, Single Nucleotide , Software
18.
BMC Bioinformatics ; 10 Suppl 1: S74, 2009 Jan 30.
Article in English | MEDLINE | ID: mdl-19208179

ABSTRACT

BACKGROUND: Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The Haplotype Pattern Mining (HPM) method is a machine learning approach to do exactly this. RESULTS: We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased. CONCLUSION: The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers.


Subject(s)
Algorithms , Computational Biology/methods , Genome, Human , Haplotypes/genetics , Databases, Genetic , Genetic Markers , Genetic Predisposition to Disease , Genetic Variation , Humans , Polymorphism, Single Nucleotide
19.
Cancer Inform ; 18: 1176935119872163, 2019.
Article in English | MEDLINE | ID: mdl-31516310

ABSTRACT

A cancer of unknown primary (CUP) is a metastatic cancer for which standard diagnostic tests fail to locate the primary cancer. As standard treatments are based on the cancer type, such cases are hard to treat and have very poor prognosis. Using molecular data from the metastatic cancer to predict the primary site can make treatment choice easier and enable targeted therapy. In this article, we first examine the ability to predict cancer type using different types of omics data. Methylation data lead to slightly better prediction than gene expression and both these are superior to classification using somatic mutations. After using 3 data types independently, we notice some differences between the classes that tend to be misclassified, suggesting that integrating the data might improve accuracy. In light of the different levels of information provided by different omics types and to be able to handle missing data, we perform multi-omics classification by hierarchically combining the classifiers. The proposed hierarchical method first classifies based on the most informative type of omics data and then uses the other types of omics data to classify samples that did not get a high confidence classification in the first step. The resulting hierarchical classifier has higher accuracy than any of the single omics classifiers and thus proves that the combination of different data types is beneficial. Our results show that using multi-omics data can improve the classification of cancer types. We confirm this by testing our method on metastatic cancers from the MET500 dataset.

20.
Methods Mol Biol ; 1910: 533-553, 2019.
Article in English | MEDLINE | ID: mdl-31278676

ABSTRACT

In this chapter, we give a short introduction to the genetics of complex diseases emphasizing evolutionary models for disease genes and the effect of different models on the genetic architecture, and we give a survey of the state-of-the-art of genome-wide association studies (GWASs).


Subject(s)
Chromosome Mapping , Genetic Predisposition to Disease , Genetic Variation , Genome-Wide Association Study , Alleles , Computational Biology/methods , Confounding Factors, Epidemiologic , Evolution, Molecular , Gene Frequency , Humans , Models, Genetic , Models, Statistical
SELECTION OF CITATIONS
SEARCH DETAIL