RESUMEN
Despite significant progress in unraveling the genetic causes of neurodevelopmental disorders (NDDs), a substantial proportion of individuals with NDDs remain without a genetic diagnosis after microarray and/or exome sequencing. Here, we aimed to assess the power of short-read genome sequencing (GS), complemented with long-read GS, to identify causal variants in participants with NDD from the National Institute for Health and Care Research (NIHR) BioResource project. Short-read GS was conducted on 692 individuals (489 affected and 203 unaffected relatives) from 465 families. Additionally, long-read GS was performed on five affected individuals who had structural variants (SVs) in technically challenging regions, had complex SVs, or required distal variant phasing. Causal variants were identified in 36% of affected individuals (177/489), and a further 23% (112/489) had a variant of uncertain significance after multiple rounds of re-analysis. Among all reported variants, 88% (333/380) were coding nuclear SNVs or insertions and deletions (indels), and the remainder were SVs, non-coding variants, and mitochondrial variants. Furthermore, long-read GS facilitated the resolution of challenging SVs and invalidated variants of difficult interpretation from short-read GS. This study demonstrates the value of short-read GS, complemented with long-read GS, in investigating the genetic causes of NDDs. GS provides a comprehensive and unbiased method of identifying all types of variants throughout the nuclear and mitochondrial genomes in individuals with NDD.
Asunto(s)
Genoma Humano , Trastornos del Neurodesarrollo , Humanos , Genoma Humano/genética , Mapeo Cromosómico , Secuencia de Bases , Mutación INDEL , Trastornos del Neurodesarrollo/genéticaRESUMEN
Each year, blood transfusions save millions of lives. However, under current blood-matching practices, sensitization to non-self-antigens is an unavoidable adverse side effect of transfusion. We describe a universal donor typing platform that could be adopted by blood services worldwide to facilitate a universal extended blood-matching policy and reduce sensitization rates. This DNA-based test is capable of simultaneously typing most clinically relevant red blood cell (RBC), human platelet (HPA), and human leukocyte (HLA) antigens. Validation was performed, using samples from 7927 European, 27 South Asian, 21 East Asian, and 9 African blood donors enrolled in 2 national biobanks. We illustrated the usefulness of the platform by analyzing antibody data from patients sensitized with multiple RBC alloantibodies. Genotyping results demonstrated concordance of 99.91%, 99.97%, and 99.03% with RBC, HPA, and HLA clinically validated typing results in 89 371, 3016, and 9289 comparisons, respectively. Genotyping increased the total number of antigen typing results available from 110 980 to >1 200 000. Dense donor typing allowed identification of 2 to 6 times more compatible donors to serve 3146 patients with multiple RBC alloantibodies, providing at least 1 match for 176 individuals for whom previously no blood could be found among the same donors. This genotyping technology is already being used to type thousands of donors taking part in national genotyping studies. Extraction of dense antigen-typing data from these cohorts provides blood supply organizations with the opportunity to implement a policy of genomics-based precision matching of blood.
Asunto(s)
Donantes de Sangre , Transfusión Sanguínea , Genotipo , Humanos , Isoanticuerpos , Estudios ProspectivosRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Most patients with rare diseases do not receive a molecular diagnosis and the aetiological variants and causative genes for more than half such disorders remain to be discovered1. Here we used whole-genome sequencing (WGS) in a national health system to streamline diagnosis and to discover unknown aetiological variants in the coding and non-coding regions of the genome. We generated WGS data for 13,037 participants, of whom 9,802 had a rare disease, and provided a genetic diagnosis to 1,138 of the 7,065 extensively phenotyped participants. We identified 95 Mendelian associations between genes and rare diseases, of which 11 have been discovered since 2015 and at least 79 are confirmed to be aetiological. By generating WGS data of UK Biobank participants2, we found that rare alleles can explain the presence of some individuals in the tails of a quantitative trait for red blood cells. Finally, we identified four novel non-coding variants that cause disease through the disruption of transcription of ARPC1B, GATA1, LRBA and MPL. Our study demonstrates a synergy by using WGS for diagnosis and aetiological discovery in routine healthcare.
Asunto(s)
Internacionalidad , Programas Nacionales de Salud , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Secuenciación Completa del Genoma , Complejo 2-3 Proteico Relacionado con la Actina/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Alelos , Bases de Datos Factuales , Eritrocitos/metabolismo , Factor de Transcripción GATA1/genética , Humanos , Fenotipo , Sitios de Carácter Cuantitativo , Receptores de Trombopoyetina/genética , Medicina Estatal , Reino UnidoRESUMEN
Several strands of evidence question the dogma that human mitochondrial DNA (mtDNA) is inherited exclusively down the maternal line, most recently in three families where several individuals harbored a 'heteroplasmic haplotype' consistent with biparental transmission. Here we report a similar genetic signature in 7 of 11,035 trios, with allelic fractions of 5-25%, implying biparental inheritance of mtDNA in 0.06% of offspring. However, analysing the nuclear whole genome sequence, we observe likely large rare or unique nuclear-mitochondrial DNA segments (mega-NUMTs) transmitted from the father in all 7 families. Independently detecting mega-NUMTs in 0.13% of fathers, we see autosomal transmission of the haplotype. Finally, we show the haplotype allele fraction can be explained by complex concatenated mtDNA-derived sequences rearranged within the nuclear genome. We conclude that rare cryptic mega-NUMTs can resemble paternally mtDNA heteroplasmy, but find no evidence of paternal transmission of mtDNA in humans.
Asunto(s)
Núcleo Celular/genética , ADN Mitocondrial/genética , Herencia Paterna/genética , Familia , Femenino , Haplotipos/genética , Humanos , Masculino , Modelos Genéticos , Linaje , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Primary membranoproliferative GN, including complement 3 (C3) glomerulopathy, is a rare, untreatable kidney disease characterized by glomerular complement deposition. Complement gene mutations can cause familial C3 glomerulopathy, and studies have reported rare variants in complement genes in nonfamilial primary membranoproliferative GN. METHODS: We analyzed whole-genome sequence data from 165 primary membranoproliferative GN cases and 10,250 individuals without the condition (controls) as part of the National Institutes of Health Research BioResource-Rare Diseases Study. We examined copy number, rare, and common variants. RESULTS: Our analysis included 146 primary membranoproliferative GN cases and 6442 controls who were unrelated and of European ancestry. We observed no significant enrichment of rare variants in candidate genes (genes encoding components of the complement alternative pathway and other genes associated with the related disease atypical hemolytic uremic syndrome; 6.8% in cases versus 5.9% in controls) or exome-wide. However, a significant common variant locus was identified at 6p21.32 (rs35406322) (P=3.29×10-8; odds ratio [OR], 1.93; 95% confidence interval [95% CI], 1.53 to 2.44), overlapping the HLA locus. Imputation of HLA types mapped this signal to a haplotype incorporating DQA1*05:01, DQB1*02:01, and DRB1*03:01 (P=1.21×10-8; OR, 2.19; 95% CI, 1.66 to 2.89). This finding was replicated by analysis of HLA serotypes in 338 individuals with membranoproliferative GN and 15,614 individuals with nonimmune renal failure. CONCLUSIONS: We found that HLA type, but not rare complement gene variation, is associated with primary membranoproliferative GN. These findings challenge the paradigm of complement gene mutations typically causing primary membranoproliferative GN and implicate an underlying autoimmune mechanism in most cases.
Asunto(s)
Complemento C3/inmunología , Glomerulonefritis Membranoproliferativa/genética , Secuenciación Completa del Genoma , Factor Nefrítico del Complemento 3/análisis , Femenino , Glomerulonefritis Membranoproliferativa/etiología , Antígenos HLA-DQ/genética , Antígenos HLA-DR/genética , Humanos , Masculino , SerogrupoRESUMEN
IL-6 excess is central to the pathogenesis of multiple inflammatory conditions and is targeted in clinical practice by immunotherapy that blocks the IL-6 receptor encoded by IL6R We describe two patients with homozygous mutations in IL6R who presented with recurrent infections, abnormal acute-phase responses, elevated IgE, eczema, and eosinophilia. This study identifies a novel primary immunodeficiency, clarifying the contribution of IL-6 to the phenotype of patients with mutations in IL6ST, STAT3, and ZNF341, genes encoding different components of the IL-6 signaling pathway, and alerts us to the potential toxicity of drugs targeting the IL-6R.
Asunto(s)
Síndromes de Inmunodeficiencia/patología , Inflamación/patología , Receptores de Interleucina-6/deficiencia , Adolescente , Adulto , Niño , Preescolar , Femenino , Células HEK293 , Humanos , Recién Nacido , Masculino , Receptores de Interleucina-6/metabolismoRESUMEN
Approximately 2.4% of the human mitochondrial DNA (mtDNA) genome exhibits common homoplasmic genetic variation. We analyzed 12,975 whole-genome sequences to show that 45.1% of individuals from 1526 mother-offspring pairs harbor a mixed population of mtDNA (heteroplasmy), but the propensity for maternal transmission differs across the mitochondrial genome. Over one generation, we observed selection both for and against variants in specific genomic regions; known variants were more likely to be transmitted than previously unknown variants. However, new heteroplasmies were more likely to match the nuclear genetic ancestry as opposed to the ancestry of the mitochondrial genome on which the mutations occurred, validating our findings in 40,325 individuals. Thus, human mtDNA at the population level is shaped by selective forces within the female germ line under nuclear genetic control, which ensures consistency between the two independent genetic lineages.
Asunto(s)
ADN Mitocondrial/genética , Genoma Mitocondrial , Herencia Materna , Óvulo/crecimiento & desarrollo , Selección Genética , Femenino , Variación Genética , HumanosRESUMEN
A targeted high-throughput sequencing (HTS) panel test for clinical diagnostics requires careful consideration of the inclusion of appropriate diagnostic-grade genes, the ability to detect multiple types of genomic variation with high levels of analytic sensitivity and reproducibility, and variant interpretation by a multidisciplinary team (MDT) in the context of the clinical phenotype. We have sequenced 2396 index patients using the ThromboGenomics HTS panel test of diagnostic-grade genes known to harbor variants associated with rare bleeding, thrombotic, or platelet disorders (BTPDs). The molecular diagnostic rate was determined by the clinical phenotype, with an overall rate of 49.2% for all thrombotic, coagulation, platelet count, and function disorder patients and a rate of 3.2% for patients with unexplained bleeding disorders characterized by normal hemostasis test results. The MDT classified 745 unique variants, including copy number variants (CNVs) and intronic variants, as pathogenic, likely pathogenic, or variants of uncertain significance. Half of these variants (50.9%) are novel and 41 unique variants were identified in 7 genes recently found to be implicated in BTPDs. Inspection of canonical hemostasis pathways identified 29 patients with evidence of oligogenic inheritance. A molecular diagnosis has been reported for 894 index patients providing evidence that introducing an HTS genetic test is a valuable addition to laboratory diagnostics in patients with a high likelihood of having an inherited BTPD.
Asunto(s)
Trastornos de las Plaquetas Sanguíneas , Hemorragia , Secuenciación de Nucleótidos de Alto Rendimiento , Trombosis , Trastornos de las Plaquetas Sanguíneas/diagnóstico , Trastornos de las Plaquetas Sanguíneas/genética , Femenino , Dosificación de Gen , Hemorragia/diagnóstico , Hemorragia/genética , Hemostasis/genética , Humanos , Masculino , Trombosis/diagnóstico , Trombosis/genéticaRESUMEN
Basilar invagination, Platibasi, increased tentorium angle, and posterior fossa hypoplasia are the anomalies associated with Chiari malformation. When Chiari is symptomatic; tonsillary ectopenia appears to be a definitive criterion for diagnosis and treatment, the detection of additional anomaly may alter the surgical outcome. The aim of this study is to investigate the relationship between tonsillar ectopia and other anomalies.The authors retrospectively reviewed 31 cases which had Chiari Malformation at our Hospital. There were 8 men (25.8%) and 23 female (74.2%). Average age of the samples is 37.93â±â12.93 years. Seventeen patients (54.8%) had tonsillar ectopia 0 to 5 mm, 14 patients had tonsillar ectopia over 5 mm. Seven patients had syrinx (22.6%), 2 patients had mild hydrocephalus (6.5%). Six patients had surgery for the treatment. The mean length of the clivus was 39.3 mm, supraoksiput length was 40.4 mm, cerebellar hemisphere length was 61.08 mm, Mc Rae line was 33.14 mm, Twinning Line was 79.4mm, and Tentorium-Twinning line angle was 40.35°. There was no significant difference between Tonsillar ectopia, syrinks, and hydrocephalus. Basilar invagination had relationship between platibasi (6 patients had platibasi according to 2 mm criteria, 2 patients had platibasi according to 5 mm criteria (Pâ<â0.05). Patients with syrinx had relationship between Chamberlain line (Pâ<â0.05).In the authors' study, although there was no statistically significant difference between the tonsillary ectopia and the criteria of these anomalies, the relationship between basilar invagination and platibasi was significant.
Asunto(s)
Malformación de Arnold-Chiari/diagnóstico por imagen , Cefalometría/métodos , Imagen por Resonancia Magnética/métodos , Base del Cráneo/diagnóstico por imagen , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios RetrospectivosRESUMEN
BACKGROUND: The genetic cause of primary immunodeficiency disease (PID) carries prognostic information. OBJECTIVE: We conducted a whole-genome sequencing study assessing a large proportion of the NIHR BioResource-Rare Diseases cohort. METHODS: In the predominantly European study population of principally sporadic unrelated PID cases (n = 846), a novel Bayesian method identified nuclear factor κB subunit 1 (NFKB1) as one of the genes most strongly associated with PID, and the association was explained by 16 novel heterozygous truncating, missense, and gene deletion variants. This accounted for 4% of common variable immunodeficiency (CVID) cases (n = 390) in the cohort. Amino acid substitutions predicted to be pathogenic were assessed by means of analysis of structural protein data. Immunophenotyping, immunoblotting, and ex vivo stimulation of lymphocytes determined the functional effects of these variants. Detailed clinical and pedigree information was collected for genotype-phenotype cosegregation analyses. RESULTS: Both sporadic and familial cases demonstrated evidence of the noninfective complications of CVID, including massive lymphadenopathy (24%), unexplained splenomegaly (48%), and autoimmune disease (48%), features prior studies correlated with worse clinical prognosis. Although partial penetrance of clinical symptoms was noted in certain pedigrees, all carriers have a deficiency in B-lymphocyte differentiation. Detailed assessment of B-lymphocyte numbers, phenotype, and function identifies the presence of an increased CD21low B-cell population. Combined with identification of the disease-causing variant, this distinguishes between healthy subjects, asymptomatic carriers, and clinically affected cases. CONCLUSION: We show that heterozygous loss-of-function variants in NFKB1 are the most common known monogenic cause of CVID, which results in a temporally progressive defect in the formation of immunoglobulin-producing B cells.
Asunto(s)
Linfocitos B/inmunología , Inmunodeficiencia Variable Común/genética , Subunidad p50 de NF-kappa B/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Preescolar , Europa (Continente) , Femenino , Humanos , Lactante , Recién Nacido , Mutación con Pérdida de Función , Masculino , Persona de Mediana Edad , Fenotipo , Linfocitos T/inmunología , Adulto JovenRESUMEN
Linking non-coding genetic variants associated with the risk of diseases or disease-relevant traits to target genes is a crucial step to realize GWAS potential in the introduction of precision medicine. Here we set out to determine the mechanisms underpinning variant association with platelet quantitative traits using cell type-matched epigenomic data and promoter long-range interactions. We identify potential regulatory functions for 423 of 565 (75%) non-coding variants associated with platelet traits and we demonstrate, through ex vivo and proof of principle genome editing validation, that variants in super enhancers play an important role in controlling archetypical platelet functions.
Asunto(s)
Plaquetas/fisiología , Elementos de Facilitación Genéticos , Eritroblastos/química , Variación Genética , Megacariocitos/química , Cromatina , Humanos , Regiones Promotoras GenéticasRESUMEN
Heritable platelet function disorders (PFDs) are genetically heterogeneous and poorly characterized. Pathogenic variants in RASGRP2, which encodes calcium and diacylglycerol-regulated guanine exchange factor I (CalDAG-GEFI), have been reported previously in 3 pedigrees with bleeding and reduced platelet aggregation responses. To better define the phenotype associated with pathogenic RASGRP2 variants, we compared high-throughput sequencing and phenotype data from 2042 cases in pedigrees with unexplained bleeding or platelet disorders to data from 5422 controls. Eleven cases harbored 11 different, previously unreported RASGRP2 variants that were biallelic and likely pathogenic. The variants included 5 high-impact variants predicted to prevent CalDAG-GEFI expression and 6 missense variants affecting the CalDAG-GEFI CDC25 domain, which mediates Rap1 activation during platelet inside-out αIIbß3 signaling. Cases with biallelic RASGRP2 variants had abnormal mucocutaneous, surgical, and dental bleeding from childhood, requiring ≥1 blood or platelet transfusion in 78% of cases. Platelets displayed reduced aggregation in response to adenosine 5'-diphosphate and epinephrine, but variable aggregation defects with other agonists. There were no other consistent clinical or laboratory features. These data enable definition of human CalDAG-GEFI deficiency as a nonsyndromic, recessive PFD associated with a moderate or severe bleeding phenotype and complex defects in platelet aggregation.
Asunto(s)
Plaquetas/patología , Factores de Intercambio de Guanina Nucleótido/genética , Hemorragia/genética , Mutación/genética , Alelos , Secuencia de Bases , Femenino , Humanos , Masculino , LinajeRESUMEN
Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.
Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo , Células Madre Hematopoyéticas/metabolismo , Enfermedades del Sistema Inmune/genética , Alelos , Diferenciación Celular , Predisposición Genética a la Enfermedad , Células Madre Hematopoyéticas/patología , Humanos , Enfermedades del Sistema Inmune/patología , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Población Blanca/genéticaRESUMEN
Macrothrombocytopenia (MTP) is a heterogeneous group of disorders characterized by enlarged and reduced numbers of circulating platelets, sometimes resulting in abnormal bleeding. In most MTP, this phenotype arises because of altered regulation of platelet formation from megakaryocytes (MKs). We report the identification of DIAPH1, which encodes the Rho-effector diaphanous-related formin 1 (DIAPH1), as a candidate gene for MTP using exome sequencing, ontological phenotyping, and similarity regression. We describe 2 unrelated pedigrees with MTP and sensorineural hearing loss that segregate with a DIAPH1 R1213* variant predicting partial truncation of the DIAPH1 diaphanous autoregulatory domain. The R1213* variant was linked to reduced proplatelet formation from cultured MKs, cell clustering, and abnormal cortical filamentous actin. Similarly, in platelets, there was increased filamentous actin and stable microtubules, indicating constitutive activation of DIAPH1. Overexpression of DIAPH1 R1213* in cells reproduced the cytoskeletal alterations found in platelets. Our description of a novel disorder of platelet formation and hearing loss extends the repertoire of DIAPH1-related disease and provides new insight into the autoregulation of DIAPH1 activity.
Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Pérdida Auditiva/genética , Mutación , Trombocitopenia/genética , Células A549 , Adolescente , Adulto , Anciano , Estudios de Casos y Controles , Células Cultivadas , Niño , Femenino , Forminas , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Células HEK293 , Pérdida Auditiva/complicaciones , Humanos , Masculino , Persona de Mediana Edad , Linaje , Polimorfismo de Nucleótido Simple , Síndrome , Trombocitopenia/complicaciones , Adulto JovenRESUMEN
A critical aspect of mammalian gametogenesis is the reprogramming of genomic DNA methylation. The catalytically inactive adaptor Dnmt3L is essential to ensuring this occurs correctly, but the mechanism by which it functions is unclear. Using gene targeting to engineer a single-amino-acid mutation, we show that the Dnmt3L histone H3 binding domain (ADD) is necessary for spermatogenesis. Genome-wide single-base-resolution DNA methylome analysis of mutant germ cells revealed overall reductions in CG methylation at repetitive sequences and non-promoter CpG islands. Strikingly, we also observe an even more severe loss of non-CG methylation, suggesting an unexpected role for the ADD in this process. These epigenetic deficiencies were coupled with defects in spermatogonia, with mutant cells displaying marked changes in gene expression and reactivation of retrotransposons. Our results demonstrate that the Dnmt3L ADD is necessary for Dnmt3L function and full reproductive fitness.
RESUMEN
BACKGROUND: Familial hypercholesterolemia (FH) is an autosomal-dominant disease leading to markedly elevated low-density lipoprotein (LDL) cholesterol levels and increased risk for premature myocardial infarction (MI). Mutation carriers display variable LDL cholesterol levels, which may obscure the diagnosis. We examined by whole-exome sequencing a family in which multiple myocardial infarctions occurred at a young age with unclear etiology. METHODS: Whole-exome sequencing of three affected family members, validation of the identified variant with Sanger-sequencing, and subsequent co-segregation analysis in the family. RESULTS: The index patient (LDL cholesterol 188 mg/dL) was referred for molecular-genetic investigations. He had coronary artery bypass graft (CABG) at the age of 59 years; 12 out of 15 1st, 2nd and 3rd degree relatives were affected with coronary artery disease (CAD) and/or premature myocardial infarction (MI). We sequenced the whole-exome of the patient and two cousins with premature MI. After filtering, we were left with a potentially disease causing variant in the LDL receptor (LDLR) gene, which we validated by Sanger-sequencing (nucleotide substitution in the acceptor splice-site of exon 10, c.1359-1G > A). Sequencing of all family members available for genetic analysis revealed co-segregation of the variant with CAD (LOD 3.0) and increased LDLC (>190 mg/dL), following correction for statin treatment (LOD 4.3). Interestingly, mutation carriers presented with highly variable corrected (183-354 mg/dL) and on-treatment LDL levels (116-274 mg/dL) such that the diagnosis of FH in this family was made only after the molecular-genetic analysis. CONCLUSION: Even in families with unusual clustering of CAD FH remains to be underdiagnosed, which underscores the need for implementation of systematic screening programs. Whole-exome sequencing may facilitate identification of disease-causing variants in families with unclear etiology of MI and enable preventive treatment of mutation carriers in a more timely fashion.
Asunto(s)
Exoma , Pruebas Genéticas/métodos , Hiperlipoproteinemia Tipo II/genética , Mutación , Infarto del Miocardio/genética , Receptores de LDL/genética , Adulto , Edad de Inicio , Anciano , Biomarcadores/sangre , LDL-Colesterol/sangre , Puente de Arteria Coronaria , Análisis Mutacional de ADN , Femenino , Predisposición Genética a la Enfermedad , Humanos , Hiperlipoproteinemia Tipo II/sangre , Hiperlipoproteinemia Tipo II/diagnóstico , Hiperlipoproteinemia Tipo II/epidemiología , Masculino , Persona de Mediana Edad , Infarto del Miocardio/sangre , Infarto del Miocardio/diagnóstico , Infarto del Miocardio/epidemiología , Infarto del Miocardio/cirugía , Linaje , Fenotipo , Valor Predictivo de las PruebasRESUMEN
MOTIVATION: High-throughput measurements of mRNA abundances from microarrays involve several stages of preprocessing. At each stage, a user has access to a large number of algorithms with no universally agreed guidance on which of these to use. We show that binary representations of gene expressions, retaining only information on whether a gene is expressed or not, reduces the variability in results caused by algorithmic choice, while also improving the quality of inference drawn from microarray studies. RESULTS: Binary representation of transcriptome data has the desirable property of reducing the variability introduced at the preprocessing stages due to algorithmic choice. We compare the effect of the choice of algorithms on different problems and suggest that using binary representation of microarray data with Tanimoto kernel for support vector machine reduces the effect of the choice of algorithm and simultaneously improves the performance of classification of phenotypes.