Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 16 de 16
Filtrer
1.
Nat Genet ; 55(3): 423-436, 2023 03.
Article de Anglais | MEDLINE | ID: mdl-36914876

RÉSUMÉ

Endometriosis is a common condition associated with debilitating pelvic pain and infertility. A genome-wide association study meta-analysis, including 60,674 cases and 701,926 controls of European and East Asian descent, identified 42 genome-wide significant loci comprising 49 distinct association signals. Effect sizes were largest for stage 3/4 disease, driven by ovarian endometriosis. Identified signals explained up to 5.01% of disease variance and regulated expression or methylation of genes in endometrium and blood, many of which were associated with pain perception/maintenance (SRP14/BMF, GDAP1, MLLT10, BSN and NGF). We observed significant genetic correlations between endometriosis and 11 pain conditions, including migraine, back and multisite chronic pain (MCP), as well as inflammatory conditions, including asthma and osteoarthritis. Multitrait genetic analyses identified substantial sharing of variants associated with endometriosis and MCP/migraine. Targeted investigations of genetically regulated mechanisms shared between endometriosis and other pain conditions are needed to aid the development of new treatments and facilitate early symptomatic intervention.


Sujet(s)
Endométriose , Femelle , Humains , Endométriose/génétique , Endométriose/métabolisme , Prédisposition génétique à une maladie , Étude d'association pangénomique , Douleur , Comorbidité
2.
HGG Adv ; 3(4): 100133, 2022 Oct 13.
Article de Anglais | MEDLINE | ID: mdl-36035246

RÉSUMÉ

Copy-number variations (CNV) are believed to play an important role in a wide range of complex traits, but discovering such associations remains challenging. While whole-genome sequencing (WGS) is the gold-standard approach for CNV detection, there are several orders of magnitude more samples with available genotyping microarray data. Such array data can be exploited for CNV detection using dedicated software (e.g., PennCNV); however, these calls suffer from elevated false-positive and -negative rates. In this study, we developed a CNV quality score that weights PennCNV calls (pCNVs) based on their likelihood of being true positive. First, we established a measure of pCNV reliability by leveraging evidence from multiple omics data (WGS, transcriptomics, and methylomics) obtained from the same samples. Next, we built a predictor of omics-confirmed pCNVs, termed omics-informed quality score (OQS), using only PennCNV software output parameters. Promisingly, OQS assigned to pCNVs detected in close family members was up to 35% higher than the OQS of pCNVs not carried by other relatives (p < 3.0 × 10-90), outperforming other scores. Finally, in an association study of four anthropometric traits in 89,516 Estonian Biobank samples, the use of OQS led to a relative increase in the trait variance explained by CNVs of up to 56% compared with published quality filtering methods or scores. Overall, we put forward a flexible framework to improve any CNV detection method leveraging multi-omics evidence, applied it to improve PennCNV calls, and demonstrated its utility by improving the statistical power for downstream association analyses.

3.
Cell ; 185(16): 3041-3055.e25, 2022 08 04.
Article de Anglais | MEDLINE | ID: mdl-35917817

RÉSUMÉ

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.


Sujet(s)
Variations de nombre de copies de segment d'ADN , Génome humain , Variations de nombre de copies de segment d'ADN/génétique , Dosage génique , Haploinsuffisance/génétique , Humains
4.
Nat Commun ; 13(1): 3584, 2022 06 23.
Article de Anglais | MEDLINE | ID: mdl-35739095

RÉSUMÉ

Pelvic organ prolapse is a common gynecological condition with limited understanding of its genetic background. In this work, we perform a genome-wide association meta-analysis comprising 28,086 cases and 546,291 controls from European ancestry. We identify 19 novel genome-wide significant loci, highlighting connective tissue, urogenital and cardiometabolic as likely affected systems. Here, we prioritize many genes of potential interest and assess shared genetic and phenotypic links. Additionally, we present the first polygenic risk score, which shows similar predictive ability (Harrell C-statistic (C-stat) 0.583, standard deviation (sd) = 0.007) as five established clinical risk factors combined (number of children, body mass index, ever smoked, constipation and asthma) (C-stat = 0.588, sd = 0.007) and demonstrates a substantial incremental value in combination with these (C-stat = 0.630, sd = 0.007). These findings improve our understanding of genetic factors underlying pelvic organ prolapse and provide a solid start evaluating polygenic risk scores as a potential tool to enhance individual risk prediction.


Sujet(s)
Étude d'association pangénomique , Prolapsus d'organe pelvien , Indice de masse corporelle , Enfant , Humains , Prolapsus d'organe pelvien/génétique , Facteurs de risque
5.
Am J Hum Genet ; 109(4): 647-668, 2022 04 07.
Article de Anglais | MEDLINE | ID: mdl-35240056

RÉSUMÉ

The impact of copy-number variations (CNVs) on complex human traits remains understudied. We called CNVs in 331,522 UK Biobank participants and performed genome-wide association studies (GWASs) between the copy number of CNV-proxy probes and 57 continuous traits, revealing 131 signals spanning 47 phenotypes. Our analysis recapitulated well-known associations (e.g., 1q21 and height), revealed the pleiotropy of recurrent CNVs (e.g., 26 and 16 traits for 16p11.2-BP4-BP5 and 22q11.21, respectively), and suggested gene functionalities (e.g., MARF1 in female reproduction). Forty-eight CNV signals (38%) overlapped with single-nucleotide polymorphism (SNP)-GWASs signals for the same trait. For instance, deletion of PDZK1, which encodes a urate transporter scaffold protein, decreased serum urate levels, while deletion of RHD, which encodes the Rhesus blood group D antigen, associated with hematological traits. Other signals overlapped Mendelian disorder regions, suggesting variable expressivity and broad impact of these loci, as illustrated by signals mapping to Rotor syndrome (SLCO1B1/3), renal cysts and diabetes syndrome (HNF1B), or Charcot-Marie-Tooth (PMP22) loci. Total CNV burden negatively impacted 35 traits, leading to increased adiposity, liver/kidney damage, and decreased intelligence and physical capacity. Thirty traits remained burden associated after correcting for CNV-GWAS signals, pointing to a polygenic CNV architecture. The burden negatively correlated with socio-economic indicators, parental lifespan, and age (survivorship proxy), suggesting a contribution to decreased longevity. Together, our results showcase how studying CNVs can expand biological insights, emphasizing the critical role of this mutational class in shaping human traits and arguing in favor of a continuum between Mendelian and complex diseases.


Sujet(s)
Variations de nombre de copies de segment d'ADN , Étude d'association pangénomique , Variations de nombre de copies de segment d'ADN/génétique , Femelle , Humains , Polypeptide C de transport d'anions organiques , Hérédité multifactorielle , Phénotype , Polymorphisme de nucléotide simple/génétique
6.
Nat Commun ; 12(1): 3761, 2021 06 18.
Article de Anglais | MEDLINE | ID: mdl-34145262

RÉSUMÉ

Pernicious anemia is a rare condition characterized by vitamin B12 deficiency anemia due to lack of intrinsic factor, often caused by autoimmune gastritis. Patients with pernicious anemia have a higher incidence of other autoimmune disorders, such as type 1 diabetes, vitiligo, and autoimmune thyroid issues. Therefore, the disease has a clear autoimmune basis, although the genetic susceptibility factors have thus far remained poorly studied. We conduct a genome-wide association study meta-analysis in 2166 cases and 659,516 European controls from population-based biobanks and identify genome-wide significant signals in or near the PTPN22 (rs6679677, p = 1.91 × 10-24, OR = 1.63), PNPT1 (rs12616502, p = 3.14 × 10-8, OR = 1.70), HLA-DQB1 (rs28414666, p = 1.40 × 10-16, OR = 1.38), IL2RA (rs2476491, p = 1.90 × 10-8, OR = 1.22) and AIRE (rs74203920, p = 2.33 × 10-9, OR = 1.83) genes, thus providing robust associations between pernicious anemia and genetic risk factors.


Sujet(s)
Anémie pernicieuse/génétique , Maladies auto-immunes/génétique , Prédisposition génétique à une maladie/génétique , Adulte , Sujet âgé , Exoribonucleases/génétique , Femelle , Gastrite/anatomopathologie , Variation génétique/génétique , Étude d'association pangénomique , Chaines bêta des antigènes HLA-DQ/génétique , Humains , Sous-unité alpha du récepteur à l'interleukine-2/génétique , Mâle , Adulte d'âge moyen , Protein Tyrosine Phosphatase, Non-Receptor Type 22/génétique , Facteurs de risque , Facteurs de transcription/génétique ,
7.
Nat Commun ; 11(1): 5980, 2020 11 25.
Article de Anglais | MEDLINE | ID: mdl-33239672

RÉSUMÉ

Miscarriage is a common, complex trait affecting ~15% of clinically confirmed pregnancies. Here we present the results of large-scale genetic association analyses with 69,054 cases from five different ancestries for sporadic miscarriage, 750 cases of European ancestry for multiple (≥3) consecutive miscarriage, and up to 359,469 female controls. We identify one genome-wide significant association (rs146350366, minor allele frequency (MAF) 1.2%, P = 3.2 × 10-8, odds ratio (OR) = 1.4) for sporadic miscarriage in our European ancestry meta-analysis and three genome-wide significant associations for multiple consecutive miscarriage (rs7859844, MAF = 6.4%, P = 1.3 × 10-8, OR = 1.7; rs143445068, MAF = 0.8%, P = 5.2 × 10-9, OR = 3.4; rs183453668, MAF = 0.5%, P = 2.8 × 10-8, OR = 3.8). We further investigate the genetic architecture of miscarriage with biobank-scale Mendelian randomization, heritability, and genetic correlation analyses. Our results show that miscarriage etiopathogenesis is partly driven by genetic variation potentially related to placental biology, and illustrate the utility of large-scale biobank data for understanding this pregnancy complication.


Sujet(s)
Avortements à répétition/génétique , Avortement spontané/génétique , Prédisposition génétique à une maladie , Placenta/physiopathologie , Avortements à répétition/épidémiologie , Avortements à répétition/physiopathologie , Avortement spontané/épidémiologie , Avortement spontané/physiopathologie , Adulte , Sujet âgé , Études cas-témoins , Jeux de données comme sujet , Femelle , Fréquence d'allèle , Étude d'association pangénomique , Humains , Modes de transmission héréditaire , Recueil de l'anamnèse , Adulte d'âge moyen , Polymorphisme de nucléotide simple , Grossesse , /génétique , Jeune adulte
8.
Am J Hum Genet ; 107(4): 612-621, 2020 10 01.
Article de Anglais | MEDLINE | ID: mdl-32888428

RÉSUMÉ

Hypersensitivity reactions to drugs are often unpredictable and can be life threatening, underscoring a need for understanding their underlying mechanisms and risk factors. The extent to which germline genetic variation influences the risk of commonly reported drug allergies such as penicillin allergy remains largely unknown. We extracted data from the electronic health records of more than 600,000 participants from the UK, Estonian, and Vanderbilt University Medical Center's BioVU biobanks to study the role of genetic variation in the occurrence of self-reported penicillin hypersensitivity reactions. We used imputed SNP to HLA typing data from these cohorts to further fine map the human leukocyte antigen (HLA) association and replicated our results in 23andMe's research cohort involving a total of 1.12 million individuals. Genome-wide meta-analysis of penicillin allergy revealed two loci, including one located in the HLA region on chromosome 6. This signal was further fine-mapped to the HLA-B∗55:01 allele (OR 1.41 95% CI 1.33-1.49, p value 2.04 × 10-31) and confirmed by independent replication in 23andMe's research cohort (OR 1.30 95% CI 1.25-1.34, p value 1.00 × 10-47). The lead SNP was also associated with lower lymphocyte counts and in silico follow-up suggests a potential effect on T-lymphocytes at HLA-B∗55:01. We also observed a significant hit in PTPN22 and the GWAS results correlated with the genetics of rheumatoid arthritis and psoriasis. We present robust evidence for the role of an allele of the major histocompatibility complex (MHC) I gene HLA-B in the occurrence of penicillin allergy.


Sujet(s)
Polyarthrite rhumatoïde/génétique , Hypersensibilité médicamenteuse/génétique , Antigènes HLA-B/génétique , Polymorphisme de nucléotide simple , Protein Tyrosine Phosphatase, Non-Receptor Type 22/génétique , Psoriasis/génétique , Adulte , Allèles , Polyarthrite rhumatoïde/complications , Polyarthrite rhumatoïde/immunologie , Chromosomes humains de la paire 6/composition chimique , Hypersensibilité médicamenteuse/complications , Hypersensibilité médicamenteuse/étiologie , Hypersensibilité médicamenteuse/immunologie , Dossiers médicaux électroniques , Europe , Femelle , Expression des gènes , Locus génétiques , Prédisposition génétique à une maladie , Génome humain , Étude d'association pangénomique , Antigènes HLA-B/immunologie , Test d'histocompatibilité , Humains , Mâle , Pénicillines/effets indésirables , Protein Tyrosine Phosphatase, Non-Receptor Type 22/immunologie , Psoriasis/complications , Psoriasis/immunologie , Autorapport , Lymphocytes T/immunologie , Lymphocytes T/anatomopathologie , États-Unis
9.
Cell ; 181(6): 1246-1262.e22, 2020 06 11.
Article de Anglais | MEDLINE | ID: mdl-32442405

RÉSUMÉ

There is considerable inter-individual variability in susceptibility to weight gain despite an equally obesogenic environment in large parts of the world. Whereas many studies have focused on identifying the genetic susceptibility to obesity, we performed a GWAS on metabolically healthy thin individuals (lowest 6th percentile of the population-wide BMI spectrum) in a uniquely phenotyped Estonian cohort. We discovered anaplastic lymphoma kinase (ALK) as a candidate thinness gene. In Drosophila, RNAi mediated knockdown of Alk led to decreased triglyceride levels. In mice, genetic deletion of Alk resulted in thin animals with marked resistance to diet- and leptin-mutation-induced obesity. Mechanistically, we found that ALK expression in hypothalamic neurons controls energy expenditure via sympathetic control of adipose tissue lipolysis. Our genetic and mechanistic experiments identify ALK as a thinness gene, which is involved in the resistance to weight gain.


Sujet(s)
Kinase du lymphome anaplasique/génétique , Maigreur/génétique , Tissu adipeux/métabolisme , Adulte , Animaux , Lignée cellulaire , Études de cohortes , Drosophila/génétique , Estonie , Femelle , Humains , Leptine/génétique , Lipolyse/génétique , Mâle , Souris , Souris de lignée C57BL , Souris knockout , Obésité/génétique , Interférence par ARN/physiologie , Jeune adulte
10.
Nat Commun ; 10(1): 4626, 2019 10 11.
Article de Anglais | MEDLINE | ID: mdl-31604923

RÉSUMÉ

Infertility in men and women is a complex genetic trait with shared biological bases between the sexes. Here, we perform a series of rare variant analyses across 73,185 women and men to identify genes that contribute to primary gonadal dysfunction. We report CSMD1, a complement regulatory protein on chromosome 8p23, as a strong candidate locus in both sexes. We show that CSMD1 is enriched at the germ-cell/somatic-cell interface in both male and female gonads. Csmd1-knockout males show increased rates of infertility with significantly increased complement C3 protein deposition in the testes, accompanied by severe histological degeneration. Knockout females show significant reduction in ovarian quality and breeding success, as well as mammary branching impairment. Double knockout of Csmd1 and C3 causes non-additive reduction in breeding success, suggesting that CSMD1 and the complement pathway play an important role in the normal postnatal development of the gonads in both sexes.


Sujet(s)
Infertilité/génétique , Protéines membranaires/génétique , Protéines suppresseurs de tumeurs/génétique , Facteurs âges , Animaux , Complément C3/métabolisme , Femelle , Études d'associations génétiques , Génotype , Humains , Mâle , Glandes mammaires animales/croissance et développement , Glandes mammaires animales/anatomopathologie , Ménopause/génétique , Souris knockout , Mutation , Ovaire/anatomopathologie , Maturation sexuelle , Testicule/métabolisme
11.
BMC Cancer ; 19(1): 557, 2019 Jun 10.
Article de Anglais | MEDLINE | ID: mdl-31182048

RÉSUMÉ

BACKGROUND: Published genetic risk scores for breast cancer (BC) so far have been based on a relatively small number of markers and are not necessarily using the full potential of large-scale Genome-Wide Association Studies. This study aimed to identify an efficient polygenic predictor for BC based on best available evidence and to assess its potential for personalized risk prediction and screening strategies. METHODS: Four different genetic risk scores (two already published and two newly developed) and their combinations (metaGRS) were compared in the subsets of two population-based biobank cohorts: the UK Biobank (UKBB, 3157 BC cases, 43,827 controls) and Estonian Biobank (EstBB, 317 prevalent and 308 incident BC cases in 32,557 women). In addition, correlations between different genetic risk scores and their associations with BC risk factors were studied in both cohorts. RESULTS: The metaGRS that combines two genetic risk scores (metaGRS2 - based on 75 and 898 Single Nucleotide Polymorphisms, respectively) had the strongest association with prevalent BC status in both cohorts. One standard deviation difference in the metaGRS2 corresponded to an Odds Ratio = 1.6 (95% CI 1.54 to 1.66, p = 9.7*10- 135) in the UK Biobank and accounting for family history marginally attenuated the effect (Odds Ratio = 1.58, 95% CI 1.53 to 1.64, p = 7.8*10- 129). In the EstBB cohort, the hazard ratio of incident BC for the women in the top 5% of the metaGRS2 compared to women in the lowest 50% was 4.2 (95% CI 2.8 to 6.2, p = 8.1*10- 13). The different GRSs were only moderately correlated with each other and were associated with different known predictors of BC. The classification of genetic risk for the same individual varied considerably depending on the chosen GRS. CONCLUSIONS: We have shown that metaGRS2, that combined on the effects of more than 900 SNPs, provided best predictive ability for breast cancer in two different population-based cohorts. The strength of the effect of metaGRS2 indicates that the GRS could potentially be used to develop more efficient strategies for breast cancer screening for genotyped women.


Sujet(s)
Marqueurs biologiques tumoraux/génétique , Tumeurs du sein/génétique , Génotype , Hérédité multifactorielle , Adulte , Sujet âgé , Études cas-témoins , Études de cohortes , Dépistage précoce du cancer , Femelle , Prédisposition génétique à une maladie , Dépistage génétique , Étude d'association pangénomique , Humains , Adulte d'âge moyen , Polymorphisme de nucléotide simple , Groupes de population , Valeur prédictive des tests , Pronostic , Risque
12.
Genet Med ; 21(6): 1345-1354, 2019 06.
Article de Anglais | MEDLINE | ID: mdl-30327539

RÉSUMÉ

PURPOSE: Biomedical databases combining electronic medical records and phenotypic and genomic data constitute a powerful resource for the personalization of treatment. To leverage the wealth of information provided, algorithms are required that systematically translate the contained information into treatment recommendations based on existing genotype-phenotype associations. METHODS: We developed and tested algorithms for translation of preexisting genotype data of over 44,000 participants of the Estonian biobank into pharmacogenetic recommendations. We compared the results obtained by genome sequencing, exome sequencing, and genotyping using microarrays, and evaluated the impact of pharmacogenetic reporting based on drug prescription statistics in the Nordic countries and Estonia. RESULTS: Our most striking result was that the performance of genotyping arrays is similar to that of genome sequencing, whereas exome sequencing is not suitable for pharmacogenetic predictions. Interestingly, 99.8% of all assessed individuals had a genotype associated with increased risks to at least one medication, and thereby the implementation of pharmacogenetic recommendations based on genotyping affects at least 50 daily drug doses per 1000 inhabitants. CONCLUSION: We find that microarrays are a cost-effective solution for creating preemptive pharmacogenetic reports, and with slight modifications, existing databases can be applied for automated pharmacogenetic decision support for clinicians.


Sujet(s)
Pharmacogénétique/méthodes , Variants pharmacogénomiques/génétique , Analyse de séquence d'ADN/méthodes , Algorithmes , Biobanques , Bases de données factuelles , Dossiers médicaux électroniques , Estonie , Dépistage génétique/normes , Génotype , Humains , Séquençage par oligonucléotides en batterie/méthodes , Test pharmacogénomique/méthodes , Phénotype , Médecine de précision/méthodes
13.
Bioinformatics ; 34(11): 1937-1938, 2018 06 01.
Article de Anglais | MEDLINE | ID: mdl-29360956

RÉSUMÉ

Summary: Designing PCR primers for amplifying regions of eukaryotic genomes is a complicated task because the genomes contain a large number of repeat sequences and other regions unsuitable for amplification by PCR. We have developed a novel k-mer based masking method that uses a statistical model to detect and mask failure-prone regions on the DNA template prior to primer design. We implemented the software as a standalone software primer3_masker and integrated it into the primer design program Primer3. Availability and implementation: The standalone version of primer3_masker is implemented in C. The source code is freely available at https://github.com/bioinfo-ut/primer3_masker/ (standalone version for Linux and macOS) and at https://github.com/primer3-org/primer3/ (integrated version). Primer3 web application that allows masking sequences of 196 animal and plant genomes is available at http://primer3.ut.ee/. Contact: maido.remm@ut.ee. Supplementary information: Supplementary data are available at Bioinformatics online.


Sujet(s)
Amorces ADN , Réaction de polymérisation en chaîne/méthodes , Séquences répétées d'acides nucléiques , Logiciel , Animaux , Humains , Plantes/génétique
14.
Sci Rep ; 7(1): 2537, 2017 05 31.
Article de Anglais | MEDLINE | ID: mdl-28566690

RÉSUMÉ

We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina "Platinum" genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub (https://github.com/bioinfo-ut/GenomeTester4/).


Sujet(s)
Algorithmes , Génome humain , Polymorphisme de nucléotide simple , Analyse de séquence d'ADN/méthodes , Logiciel , Théorème de Bayes , Référenciation , Génotype , Séquençage nucléotidique à haut débit , Humains , Reproductibilité des résultats , Analyse de séquence d'ADN/statistiques et données numériques
15.
PeerJ ; 5: e3353, 2017.
Article de Anglais | MEDLINE | ID: mdl-28533988

RÉSUMÉ

BACKGROUND: Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. RESULTS: A tool named StrainSeeker was developed that constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k-mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. CONCLUSION: StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.

16.
Gigascience ; 4: 58, 2015.
Article de Anglais | MEDLINE | ID: mdl-26640690

RÉSUMÉ

BACKGROUND: K-mer-based methods of genome analysis have attracted great interest because they do not require genome assembly and can be performed directly on sequencing reads. Many analysis tasks require one to compare k-mer lists from different sequences to find words that are either unique to a specific sequence or common to many sequences. However, no stand-alone k-mer analysis tool currently allows one to perform these algebraic set operations. FINDINGS: We have developed the GenomeTester4 toolkit, which contains a novel tool GListCompare for performing union, intersection and complement (difference) set operations on k-mer lists. We provide examples of how these general operations can be combined to solve a variety of biological analysis tasks. CONCLUSIONS: GenomeTester4 can be used to simplify k-mer list manipulation for many biological analysis tasks.


Sujet(s)
Génome , Génomique/méthodes , Analyse de séquence d'ADN/méthodes , Logiciel , Animaux , Humains
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE