RESUMEN
Colorectal cancers are one of the most prevalent tumour types worldwide and, despite the emergence of targeted and biologic therapies, have among the highest mortality rates. The Personalized OncoGenomics (POG) program at BC Cancer performs whole genome and transcriptome analysis (WGTA) to identify specific alterations in an individual's cancer that may be most effectively targeted. Informed using WGTA, a patient with advanced mismatch repair-deficient colorectal cancer was treated with the antihypertensive drug irbesartan and experienced a profound and durable response. We describe the subsequent relapse of this patient and potential mechanisms of response using WGTA and multiplex immunohistochemistry (m-IHC) profiling of biopsies before and after treatment from the same metastatic site of the L3 spine. We did not observe marked differences in the genomic landscape before and after treatment. Analyses revealed an increase in immune signalling and infiltrating immune cells, particularly CD8+ T cells, in the relapsed tumour. These results indicate that the observed anti-tumour response to irbesartan may have been due to an activated immune response. Determining whether there may be other cancer contexts in which irbesartan may be similarly valuable will require additional studies.
Asunto(s)
Antihipertensivos , Neoplasias Colorrectales , Humanos , Irbesartán/uso terapéutico , Antihipertensivos/uso terapéutico , Linfocitos T CD8-positivos/patología , Neoplasias Colorrectales/tratamiento farmacológico , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patologíaRESUMEN
BACKGROUND: Recent advances are enabling delivery of precision genomic medicine to cancer clinics. While the majority of approaches profile panels of selected genes or hotspot regions, comprehensive data provided by whole-genome and transcriptome sequencing and analysis (WGTA) present an opportunity to align a much larger proportion of patients to therapies. PATIENTS AND METHODS: Samples from 570 patients with advanced or metastatic cancer of diverse types enrolled in the Personalized OncoGenomics (POG) program underwent WGTA. DNA-based data, including mutations, copy number and mutation signatures, were combined with RNA-based data, including gene expression and fusions, to generate comprehensive WGTA profiles. A multidisciplinary molecular tumour board used WGTA profiles to identify and prioritize clinically actionable alterations and inform therapy. Patient responses to WGTA-informed therapies were collected. RESULTS: Clinically actionable targets were identified for 83% of patients, of which 37% of patients received WGTA-informed treatments. RNA expression data were particularly informative, contributing to 67% of WGTA-informed treatments; 25% of treatments were informed by RNA expression alone. Of a total 248 WGTA-informed treatments, 46% resulted in clinical benefit. RNA expression data were comparable to DNA-based mutation and copy number data in aligning to clinically beneficial treatments. Genome signatures also guided therapeutics including platinum, poly-ADP ribose polymerase inhibitors and immunotherapies. Patients accessed WGTA-informed treatments through clinical trials (19%), off-label use (35%) and as standard therapies (46%) including those which would not otherwise have been the next choice of therapy, demonstrating the utility of genomic information to direct use of chemotherapies as well as targeted therapies. CONCLUSIONS: Integrating RNA expression and genome data illuminated treatment options that resulted in 46% of treated patients experiencing positive clinical benefit, supporting the use of comprehensive WGTA profiling in clinical cancer care.
Asunto(s)
Neoplasias , Perfilación de la Expresión Génica , Genómica/métodos , Humanos , Mutación , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Medicina de Precisión/métodos , ARN , TranscriptomaRESUMEN
BACKGROUND: NRG1 fusion-positive lung cancers have emerged as potentially actionable events in lung cancer, but clinical support is currently limited and no evidence of efficacy of this approach in cancers beyond lung has been shown. PATIENTS AND METHODS: Here, we describe two patients with advanced cancers refractory to standard therapies. Patient 1 had lung adenocarcinoma and patient 2 cholangiocarcinoma. Whole-genome and transcriptome sequencing were carried out for these cases with select findings validated by fluorescence in situ hybridization. RESULTS: Both tumors were found to be positive for NRG1 gene fusions. In patient 1, an SDC4-NRG1 gene fusion was detected, similar gene fusions having been described in lung cancers previously. In patient 2, a novel ATP1B1-NRG1 gene fusion was detected. Cholangiocarcinoma is not a disease type in which NRG1 fusions had been described previously. Integrative genome analysis was used to assess the potential functional significance of the detected genomic events including the gene fusions, prioritizing therapeutic strategies targeting the HER-family of growth factor receptors. Both patients were treated with the pan HER-family kinase inhibitor afatinib and both displayed significant and durable response to treatment. Upon progression sites of disease were sequenced. The lack of obvious genomic events to describe the disease progression indicated that broad transcriptomic or epigenetic mechanisms could be attributed to the lack of prolonged response to afatinib. CONCLUSION: These observations lend further support to the use of pan HER-tyrosine kinase inhibitors for the treatment of NRG1 fusion-positive in both cancers of lung and hepatocellular origin and indicate more broadly that cancers found to be NRG1 fusion-positive may benefit from such a clinical approach regardless of their site of origin. CLINICAL TRIAL INFORMATION: Personalized Oncogenomics (POG) Program of British Columbia: Utilization of Genomic Analysis to Better Understand Tumour Heterogeneity and Evolution (NCT02155621).
Asunto(s)
Adenocarcinoma/tratamiento farmacológico , Neoplasias de los Conductos Biliares/tratamiento farmacológico , Colangiocarcinoma/tratamiento farmacológico , Neoplasias Pulmonares/tratamiento farmacológico , Neurregulina-1/genética , Neurregulina-1/metabolismo , Quinazolinas/uso terapéutico , Adenocarcinoma/genética , Adenocarcinoma/metabolismo , Adenocarcinoma del Pulmón , Adulto , Afatinib , Neoplasias de los Conductos Biliares/genética , Neoplasias de los Conductos Biliares/metabolismo , Colangiocarcinoma/genética , Colangiocarcinoma/metabolismo , Femenino , Perfilación de la Expresión Génica , Humanos , Hibridación Fluorescente in Situ , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Proteínas de Fusión Oncogénica/genética , Proteínas de Fusión Oncogénica/metabolismo , Inhibidores de Proteínas Quinasas/uso terapéutico , Sindecano-4/genéticaRESUMEN
Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated, and it has a close phylogenetic relationship to Plasmodium vivax, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome and other sequenced Plasmodium genomes. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.
Asunto(s)
Genoma de Protozoos/genética , Genómica , Macaca mulatta/parasitología , Malaria/parasitología , Plasmodium knowlesi/genética , Secuencia de Aminoácidos , Animales , Antígenos CD/química , Antígenos CD/genética , Cromosomas/genética , Secuencia Conservada , Genes Protozoarios/genética , Humanos , Datos de Secuencia Molecular , Plasmodium knowlesi/clasificación , Plasmodium knowlesi/fisiología , Estructura Terciaria de Proteína , Proteínas Protozoarias/química , Proteínas Protozoarias/genética , Análisis de Secuencia de ADN , Telómero/genéticaRESUMEN
AIMS/HYPOTHESIS: The paucity of information on the epigenetic barriers that are blocking reprogramming protocols, and on what makes a beta cell unique, has hampered efforts to develop novel beta cell sources. Here, we aimed to identify enhancers in pancreatic islets, to understand their developmental ontologies, and to identify enhancers unique to islets to increase our understanding of islet-specific gene expression. METHODS: We combined H3K4me1-based nucleosome predictions with pancreatic and duodenal homeobox 1 (PDX1), neurogenic differentiation 1 (NEUROD1), v-Maf musculoaponeurotic fibrosarcoma oncogene family, protein A (MAFA) and forkhead box A2 (FOXA2) occupancy data to identify enhancers in mouse islets. RESULTS: We identified 22,223 putative enhancer loci in in vivo mouse islets. Our validation experiments suggest that nearly half of these loci are active in regulating islet gene expression, with the remaining regions probably poised for activity. We showed that these loci have at least nine developmental ontologies, and that islet enhancers predominately acquire H3K4me1 during differentiation. We next discriminated 1,799 enhancers unique to islets and showed that these islet-specific enhancers have reduced association with annotated genes, and identified a subset that are instead associated with novel islet-specific long non-coding RNAs (lncRNAs). CONCLUSIONS/INTERPRETATIONS: Our results indicate that genes with islet-specific expression and function tend to have enhancers devoid of histone methylation marks or, less often, that are bivalent or repressed, in embryonic stem cells and liver. Further, we identify a subset of enhancers unique to islets that are associated with novel islet-specific genes and lncRNAs. We anticipate that these data will facilitate the development of novel sources of functional beta cell mass.
Asunto(s)
Islotes Pancreáticos/metabolismo , Animales , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Inmunoprecipitación de Cromatina , Elementos de Facilitación Genéticos/genética , Factor Nuclear 3-beta del Hepatocito/metabolismo , Proteínas de Homeodominio/metabolismo , Ratones , Proteínas del Tejido Nervioso/metabolismo , Transactivadores/metabolismoRESUMEN
The lipopolysaccharide (LPS) from eight strains of Yersinia pestis which had been cultured at 28 degrees C appeared to be devoid of an O-antigen when analysed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. LPS isolated from three of these strains which had been cultured at 37 degrees C also appeared to be devoid of an O-antigen. When the LPS from Y. pestis strain CO92 was purified and analysed by matrix-assisted laser desorption-ionisation time-of-flight mass spectrometry, the observed signals were in the mass range predicted for molecules containing lipid A plus the core oligosaccharide but lacking an O-antigen. The nucleotide sequence of Y. pestis strain CO92 revealed the presence of a putative O-antigen gene cluster. However, frame-shift mutations in the ddhB, gmd, fcl and ushA genes are likely to prevent expression of the O-antigen thus explaining the loss of phenotype.
Asunto(s)
Genoma Bacteriano , Antígenos O/genética , Yersinia pestis/química , Espectrometría de Masas , Familia de Multigenes/genética , Mutación , Temperatura , Yersinia pestis/genética , Yersinia pestis/crecimiento & desarrolloRESUMEN
Everything that we need to know about Mycobacterium leprae, a close relative of the tubercle bacillus, is encrypted in its genome. Inspection of the 3.27 Mb genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus identified 1,605 genes encoding proteins and 50 genes for stable RNA species. Comparison with the genome sequence of Mycobacterium tuberculosis revealed an extreme case of reductive evolution, since less than half of the genome contains functional genes while inactivated or pseudogenes are highly abundant. The level of gene duplication was approximately 34% and, on classification of the proteins into families, the largest functional groups were found to be involved in the metabolism and modification of fatty acids and polyketides, transport of metabolites, cell envelope synthesis and gene regulation. Reductive evolution, gene decay and genome downsizing have eliminated entire metabolic pathways, together with their regulatory circuits and accessory functions, particularly those involved in catabolism. This may explain the unusually long generation time and account for our inability to culture the leprosy bacillus.
Asunto(s)
Genes Bacterianos/genética , Genoma Bacteriano , Lepra/microbiología , Mycobacterium leprae/genética , Evolución Molecular , HumanosRESUMEN
Campylobacter jejuni, from the delta-epsilon group of proteobacteria, is a microaerophilic, Gram-negative, flagellate, spiral bacterium-properties it shares with the related gastric pathogen Helicobacter pylori. It is the leading cause of bacterial food-borne diarrhoeal disease throughout the world. In addition, infection with C. jejuni is the most frequent antecedent to a form of neuromuscular paralysis known as Guillain-Barré syndrome. Here we report the genome sequence of C. jejuni NCTC11168. C. jejuni has a circular chromosome of 1,641,481 base pairs (30.6% G+C) which is predicted to encode 1,654 proteins and 54 stable RNA species. The genome is unusual in that there are virtually no insertion sequences or phage-associated sequences and very few repeat sequences. One of the most striking findings in the genome was the presence of hypervariable sequences. These short homopolymeric runs of nucleotides were commonly found in genes encoding the biosynthesis or modification of surface structures, or in closely linked genes of unknown function. The apparently high rate of variation of these homopolymeric tracts may be important in the survival strategy of C. jejuni.
Asunto(s)
Proteínas Bacterianas , Campylobacter jejuni/genética , Variación Genética , Genoma Bacteriano , Secuencia de Aminoácidos , Toxinas Bacterianas/genética , Campylobacter jejuni/clasificación , Campylobacter jejuni/metabolismo , Quimiotaxis , Contaminación de Alimentos , Humanos , Lipopolisacáridos/biosíntesis , Proteínas de la Membrana/metabolismo , Proteínas Quimiotácticas Aceptoras de Metilo , Datos de Secuencia Molecular , FilogeniaRESUMEN
Population genetic studies suggest that Yersinia pestis, the cause of plague, is a clonal pathogen that has recently emerged from Yersinia pseudotuberculosis. Plasmid acquisition is likely to have been a key element in this evolutionary leap from an enteric to a flea-transmitted systemic pathogen. However, the origin of Y. pestis-specific plasmids remains obscure. We demonstrate specific plasmid rearrangements in different Y. pestis strains which distinguish Y. pestis bv. Orientalis strains from other biovars. We also present evidence for plasmid-associated DNA exchange between Y. pestis and the exclusively human pathogen Salmonella enterica serovar Typhi.
Asunto(s)
Evolución Molecular , Plásmidos/genética , Salmonella typhi/genética , Yersinia pestis/clasificación , Yersinia pestis/genética , Animales , Elementos Transponibles de ADN/genética , ADN Bacteriano/genética , Transferencia de Gen Horizontal/genética , Humanos , Datos de Secuencia Molecular , Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADNRESUMEN
Neisseria meningitidis causes bacterial meningitis and is therefore responsible for considerable morbidity and mortality in both the developed and the developing world. Meningococci are opportunistic pathogens that colonize the nasopharynges and oropharynges of asymptomatic carriers. For reasons that are still mostly unknown, they occasionally gain access to the blood, and subsequently to the cerebrospinal fluid, to cause septicaemia and meningitis. N. meningitidis strains are divided into a number of serogroups on the basis of the immunochemistry of their capsular polysaccharides; serogroup A strains are responsible for major epidemics and pandemics of meningococcal disease, and therefore most of the morbidity and mortality associated with this disease. Here we have determined the complete genome sequence of a serogroup A strain of Neisseria meningitidis, Z2491. The sequence is 2,184,406 base pairs in length, with an overall G+C content of 51.8%, and contains 2,121 predicted coding sequences. The most notable feature of the genome is the presence of many hundreds of repetitive elements, ranging from short repeats, positioned either singly or in large multiple arrays, to insertion sequences and gene duplications of one kilobase or more. Many of these repeats appear to be involved in genome fluidity and antigenic variation in this important human pathogen.
Asunto(s)
ADN Bacteriano , Genoma Bacteriano , Neisseria meningitidis/genética , Variación Antigénica/genética , Proteínas Bacterianas/genética , Reordenamiento Génico , Datos de Secuencia Molecular , Neisseria meningitidis/clasificación , Secuencias Repetitivas de Ácidos Nucleicos , Análisis de Secuencia de ADN , SerotipificaciónRESUMEN
Salmonella enterica serovar Typhi (S. typhi) is the aetiological agent of typhoid fever, a serious invasive bacterial disease of humans with an annual global burden of approximately 16 million cases, leading to 600,000 fatalities. Many S. enterica serovars actively invade the mucosal surface of the intestine but are normally contained in healthy individuals by the local immune defence mechanisms. However, S. typhi has evolved the ability to spread to the deeper tissues of humans, including liver, spleen and bone marrow. Here we have sequenced the 4,809,037-base pair (bp) genome of a S. typhi (CT18) that is resistant to multiple drugs, revealing the presence of hundreds of insertions and deletions compared with the Escherichia coli genome, ranging in size from single genes to large islands. Notably, the genome sequence identifies over two hundred pseudogenes, several corresponding to genes that are known to contribute to virulence in Salmonella typhimurium. This genetic degradation may contribute to the human-restricted host range for S. typhi. CT18 harbours a 218,150-bp multiple-drug-resistance incH1 plasmid (pHCM1), and a 106,516-bp cryptic plasmid (pHCM2), which shows recent common ancestry with a virulence plasmid of Yersinia pestis.
Asunto(s)
Genoma Bacteriano , Salmonella typhi/genética , Mapeo Cromosómico , Cromosomas Bacterianos , ADN Bacteriano , Farmacorresistencia Bacteriana Múltiple/genética , Escherichia coli/genética , Eliminación de Gen , Humanos , Datos de Secuencia Molecular , Mutagénesis Insercional , Plásmidos/genética , Recombinación Genética , Salmonella typhimurium/genética , Análisis de Secuencia de ADN , SerotipificaciónRESUMEN
The bacterial family Enterobacteriaceae is notable for its well studied human pathogens, including Salmonella, Yersinia, Shigella, and Escherichia spp. However, it also contains several plant pathogens. We report the genome sequence of a plant pathogenic enterobacterium, Erwinia carotovora subsp. atroseptica (Eca) strain SCRI1043, the causative agent of soft rot and blackleg potato diseases. Approximately 33% of Eca genes are not shared with sequenced enterobacterial human pathogens, including some predicted to facilitate unexpected metabolic traits, such as nitrogen fixation and opine catabolism. This proportion of genes also contains an overrepresentation of pathogenicity determinants, including possible horizontally acquired gene clusters for putative type IV secretion and polyketide phytotoxin synthesis. To investigate whether these gene clusters play a role in the disease process, an arrayed set of insertional mutants was generated, and mutations were identified. Plant bioassays showed that these mutants were significantly reduced in virulence, demonstrating both the presence of novel pathogenicity determinants in Eca, and the impact of functional genomics in expanding our understanding of phytopathogenicity in the Enterobacteriaceae.
Asunto(s)
Genoma Bacteriano , Pectobacterium carotovorum/genética , Pectobacterium carotovorum/patogenicidad , Enfermedades de las Plantas/microbiología , Solanum tuberosum/microbiología , Virulencia/genética , Secuencia de Bases , Evolución Biológica , Cartilla de ADN , Ambiente , Datos de Secuencia Molecular , Reacción en Cadena de la PolimerasaRESUMEN
Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.
Asunto(s)
Genoma Bacteriano , Mycobacterium leprae/genética , Animales , Armadillos , ADN Bacteriano , Metabolismo Energético , Evolución Molecular , Transferencia de Gen Horizontal , Humanos , Lepra/microbiología , Datos de Secuencia Molecular , Familia de Multigenes , Mycobacterium leprae/metabolismo , Análisis de Secuencia de ADNRESUMEN
The Gram-negative bacterium Yersinia pestis is the causative agent of the systemic invasive infectious disease classically referred to as plague, and has been responsible for three human pandemics: the Justinian plague (sixth to eighth centuries), the Black Death (fourteenth to nineteenth centuries) and modern plague (nineteenth century to the present day). The recent identification of strains resistant to multiple drugs and the potential use of Y. pestis as an agent of biological warfare mean that plague still poses a threat to human health. Here we report the complete genome sequence of Y. pestis strain CO92, consisting of a 4.65-megabase (Mb) chromosome and three plasmids of 96.2 kilobases (kb), 70.3 kb and 9.6 kb. The genome is unusually rich in insertion sequences and displays anomalies in GC base-composition bias, indicating frequent intragenomic recombination. Many genes seem to have been acquired from other bacteria and viruses (including adhesins, secretion systems and insecticidal toxins). The genome contains around 150 pseudogenes, many of which are remnants of a redundant enteropathogenic lifestyle. The evidence of ongoing genome fluidity, expansion and decay suggests Y. pestis is a pathogen that has undergone large-scale genetic flux and provides a unique insight into the ways in which new and highly virulent pathogens evolve.
Asunto(s)
Genoma Bacteriano , Yersinia pestis/genética , Animales , Antígenos Bacterianos/genética , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Cromosomas Bacterianos , ADN Bacteriano , Metabolismo Energético , Evolución Molecular , Transferencia de Gen Horizontal , Humanos , Insectos/microbiología , Lipopolisacáridos , Datos de Secuencia Molecular , Mutación , Peste/microbiología , Seudogenes , Análisis de Secuencia de ADN , Virulencia/genética , Yersinia pestis/inmunología , Yersinia pestis/patogenicidad , Yersinia pseudotuberculosis/genéticaRESUMEN
Analysis of Plasmodium falciparum chromosome 3, and comparison with chromosome 2, highlights novel features of chromosome organization and gene structure. The sub-telomeric regions of chromosome 3 show a conserved order of features, including repetitive DNA sequences, members of multigene families involved in pathogenesis and antigenic variation, a number of conserved pseudogenes, and several genes of unknown function. A putative centromere has been identified that has a core region of about 2 kilobases with an extremely high (adenine + thymidine) composition and arrays of tandem repeats. We have predicted 215 protein-coding genes and two transfer RNA genes in the 1,060,106-base-pair chromosome sequence. The predicted protein-coding genes can be divided into three main classes: 52.6% are not spliced, 45.1% have a large exon with short additional 5' or 3' exons, and 2.3% have a multiple exon structure more typical of higher eukaryotes.
Asunto(s)
Genoma de Protozoos , Plasmodium falciparum/genética , Animales , Secuencia de Bases , Centrómero , Mapeo Cromosómico , Cromosomas , ADN Protozoario , Datos de Secuencia Molecular , Proteínas Protozoarias/genética , Análisis de Secuencia de ADN , TelómeroRESUMEN
Since the sequencing of the first two chromosomes of the malaria parasite, Plasmodium falciparum, there has been a concerted effort to sequence and assemble the entire genome of this organism. Here we report the sequence of chromosomes 1, 3-9 and 13 of P. falciparum clone 3D7--these chromosomes account for approximately 55% of the total genome. We describe the methods used to map, sequence and annotate these chromosomes. By comparing our assemblies with the optical map, we indicate the completeness of the resulting sequence. During annotation, we assign Gene Ontology terms to the predicted gene products, and observe clustering of some malaria-specific terms to specific chromosomes. We identify a highly conserved sequence element found in the intergenic region of internal var genes that is not associated with their telomeric counterparts.
Asunto(s)
ADN Protozoario , Plasmodium falciparum/genética , Animales , Secuencia de Bases , Cromosomas , Genes Protozoarios , Genoma de Protozoos , Datos de Secuencia Molecular , Familia de Multigenes , Proteoma , Proteínas Protozoarias/genética , Análisis de Secuencia de ADNRESUMEN
We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.