RESUMEN
Diabetic retinopathy (DR) is a common complication of diabetes. Approximately 20% of DR patients have diabetic macular edema (DME) characterized by fluid leakage into the retina. There is a genetic component to DR and DME risk, but few replicable loci. Because not all DR cases have DME, we focused on DME to increase power, and conducted a multi-ancestry GWAS to assess DME risk in a total of 1,502 DME patients and 5,603 non-DME controls in discovery and replication datasets. Two loci reached GWAS significance (p<5x10-8). The strongest association was rs2239785, (K150E) in APOL1. The second finding was rs10402468, which co-localized to PLVAP and ANKLE1 in vascular / endothelium tissues. We conducted multiple sensitivity analyses to establish that the associations were specific to DME status and did not reflect diabetes status or other diabetic complications. Here we report two novel loci for risk of DME which replicated in multiple clinical trial and biobank derived datasets. One of these loci, containing the gene APOL1, is a risk factor in African American DME and DKD patients, indicating that this locus plays a broader role in diabetic complications for multiple ancestries. Trial Registration: NCT00473330, NCT00473382, NCT03622580, NCT03622593, NCT04108156.
Asunto(s)
Diabetes Mellitus , Retinopatía Diabética , Edema Macular , Humanos , Edema Macular/genética , Edema Macular/complicaciones , Retinopatía Diabética/genética , Retinopatía Diabética/complicaciones , Estudio de Asociación del Genoma Completo , Apolipoproteína L1/genética , Factores de RiesgoRESUMEN
Cancer immunotherapy has emerged as an effective therapy in a variety of cancers. However, a key challenge in the field is that only a subset of patients who receive immunotherapy exhibit durable response. It has been hypothesized that host genetics influences the inherent immune profiles of patients and may underlie their differential response to immunotherapy. Herein, we systematically determined the association of common germline genetic variants with gene expression and immune cell infiltration of the tumor. We identified 64,094 expression quantitative trait loci (eQTLs) that associated with 18,210 genes (eGenes) across 24 human cancers. Overall, eGenes were enriched for their being involved in immune processes, suggesting that expression of immune genes can be shaped by hereditary genetic variants. We identified the endoplasmic reticulum aminopeptidase 2 (ERAP2) gene as a pan-cancer type eGene whose expression levels stratified overall survival in a subset of patients with bladder cancer receiving anti-PD-L1 (atezolizumab) therapy. Finally, we identified 103 gene signature QTLs (gsQTLs) that were associated with predicted immune cell abundance within the tumor microenvironment. Our findings highlight the impact of germline SNPs on cancer-immune phenotypes and response to therapy; and these analyses provide a resource for integration of germline genetics as a component of personalized cancer immunotherapy.
Asunto(s)
Genes Relacionados con las Neoplasias , Neoplasias/genética , Neoplasias/inmunología , Polimorfismo Genético , Aminopeptidasas/genética , Femenino , Regulación Neoplásica de la Expresión Génica , Mutación de Línea Germinal , Humanos , Inmunidad Celular/genética , Inmunoterapia , Ligando Coestimulador de Linfocitos T Inducibles/genética , Linfocitos Infiltrantes de Tumor/inmunología , Linfocitos Infiltrantes de Tumor/patología , Masculino , Neoplasias/terapia , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Neoplasias de la Vejiga Urinaria/genética , Neoplasias de la Vejiga Urinaria/inmunología , Neoplasias de la Vejiga Urinaria/terapiaRESUMEN
In clinical trials, a placebo response refers to improvement in disease symptoms arising from the psychological effect of receiving a treatment rather than the actual treatment under investigation. Previous research has reported genomic variation associated with the likelihood of observing a placebo response, but these studies have been limited in scope and have not been validated. Here, we analyzed whole-genome sequencing data from 784 patients undergoing placebo treatment in Phase III Asthma or Rheumatoid Arthritis trials to assess the impact of previously reported variation on patient outcomes in the placebo arms and to identify novel variants associated with the placebo response. Contrary to expectations based on previous reports, we did not observe any statistically significant associations between genomic variants and placebo treatment outcome. Our findings suggest that the biological origin of the placebo response is complex and likely to be variable between disease areas.
Asunto(s)
Ensayos Clínicos Fase III como Asunto/normas , Efecto Placebo , Polimorfismo de Nucleótido Simple , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Artritis Reumatoide/tratamiento farmacológico , Artritis Reumatoide/genética , Asma/tratamiento farmacológico , Asma/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana EdadRESUMEN
Gut microbial communities represent one source of human genetic and metabolic diversity. To examine how gut microbiomes differ among human populations, here we characterize bacterial species in fecal samples from 531 individuals, plus the gene content of 110 of them. The cohort encompassed healthy children and adults from the Amazonas of Venezuela, rural Malawi and US metropolitan areas and included mono- and dizygotic twins. Shared features of the functional maturation of the gut microbiome were identified during the first three years of life in all three populations, including age-associated changes in the genes involved in vitamin biosynthesis and metabolism. Pronounced differences in bacterial assemblages and functional gene repertoires were noted between US residents and those in the other two countries. These distinctive features are evident in early infancy as well as adulthood. Our findings underscore the need to consider the microbiome when evaluating human development, nutritional needs, physiological variations and the impact of westernization.
Asunto(s)
Bacterias/clasificación , Bacterias/genética , Biodiversidad , Intestinos/microbiología , Metagenoma , Adolescente , Adulto , Factores de Edad , Anciano , Niño , Preescolar , Heces/microbiología , Femenino , Geografía , Humanos , Lactante , Malaui , Masculino , Persona de Mediana Edad , Filogenia , ARN Ribosómico 16S/genética , Gemelos Dicigóticos , Gemelos Monocigóticos , Estados Unidos , Venezuela , Adulto JovenRESUMEN
BACKGROUND: Large sample sets of whole genome sequencing with deep coverage are being generated, however assembling datasets from different sources inevitably introduces batch effects. These batch effects are not well understood and can be due to changes in the sequencing protocol or bioinformatics tools used to process the data. No systematic algorithms or heuristics exist to detect and filter batch effects or remove associations impacted by batch effects in whole genome sequencing data. RESULTS: We describe key quality metrics, provide a freely available software package to compute them, and demonstrate that identification of batch effects is aided by principal components analysis of these metrics. To mitigate batch effects, we developed new site-specific filters that identified and removed variants that falsely associated with the phenotype due to batch effect. These include filtering based on: a haplotype based genotype correction, a differential genotype quality test, and removing sites with missing genotype rate greater than 30% after setting genotypes with quality scores less than 20 to missing. This method removed 96.1% of unconfirmed genome-wide significant SNP associations and 97.6% of unconfirmed genome-wide significant indel associations. We performed analyses to demonstrate that: 1) These filters impacted variants known to be disease associated as 2 out of 16 confirmed associations in an AMD candidate SNP analysis were filtered, representing a reduction in power of 12.5%, 2) In the absence of batch effects, these filters removed only a small proportion of variants across the genome (type I error rate of 3%), and 3) in an independent dataset, the method removed 90.2% of unconfirmed genome-wide SNP associations and 89.8% of unconfirmed genome-wide indel associations. CONCLUSIONS: Researchers currently do not have effective tools to identify and mitigate batch effects in whole genome sequencing data. We developed and validated methods and filters to address this deficiency.
Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Degeneración Macular/genética , Degeneración Macular/patología , Fenotipo , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Análisis de Secuencia de ADN , Programas InformáticosRESUMEN
The intestinal microbiota consists of over 1000 species, which play key roles in gut physiology and homeostasis. Imbalances in the composition of this bacterial community can lead to transient intestinal dysfunctions and chronic disease states. Understanding how to manipulate this ecosystem is thus essential for treating many disorders. In this study, we took advantage of recently developed tools for deep sequencing and phylogenetic clustering to examine the long-term effects of exogenous microbiota transplantation combined with and without an antibiotic pretreatment. In our rat model, deep sequencing revealed an intestinal bacterial diversity exceeding that of the human gut by a factor of two to three. The transplantation produced a marked increase in the microbial diversity of the recipients, which stemmed from both capture of new phylotypes and increase in abundance of others. However, when transplantation was performed after antibiotic intake, the resulting state simply combined the reshaping effects of the individual treatments (including the reduced diversity from antibiotic treatment alone). Therefore, lowering the recipient bacterial load by antibiotic intake prior to transplantation did not increase establishment of the donor phylotypes, although some dominant lineages still transferred successfully. Remarkably, all of these effects were observed after 1 mo of treatment and persisted after 3 mo. Overall, our results indicate that the indigenous gut microbial composition is more plastic that previously anticipated. However, since antibiotic pretreatment counterintuitively interferes with the establishment of an exogenous community, such plasticity is likely conditioned more by the altered microbiome gut homeostasis caused by antibiotics than by the primary bacterial loss.
Asunto(s)
Antibacterianos/administración & dosificación , Bacterias/clasificación , Ecosistema , Tracto Gastrointestinal/microbiología , Animales , Antibacterianos/farmacología , Bacterias/efectos de los fármacos , Bacterias/genética , Carga Bacteriana , Ciego/microbiología , Ciego/cirugía , ADN Bacteriano/genética , Femenino , Tracto Gastrointestinal/efectos de los fármacos , Humanos , Intestinos/efectos de los fármacos , Intestinos/microbiología , Masculino , Filogenia , Ratas , Ratas Endogámicas Lew , Ratas Sprague-Dawley , Ratas Wistar , Análisis de Secuencia de ADNRESUMEN
While the association between colorectal cancer (CRC) features and Fusobacterium has been extensively studied, less is known of other intratumoral bacteria. Here, we leverage whole transcriptomes from 807 CRC samples to dually characterize tumor gene expression and 74 intratumoral bacteria. Seventeen of these species, including 4 Fusobacterium spp., are classified as orally derived and are enriched among right-sided, microsatellite instability-high (MSI-H), and BRAF-mutant tumors. Across consensus molecular subtypes (CMSs), integration of Fusobacterium animalis (Fa) presence and tumor expression reveals that Fa has the most significant associations in mesenchymal CMS4 tumors despite a lower prevalence than in immune CMS1. Within CMS4, the prevalence of Fa is uniquely associated with collagen- and immune-related pathways. Additional Fa pangenome analysis reveals that stress response genes and the adhesion FadA are commonly expressed intratumorally. Overall, this study identifies oral-derived bacteria as enriched in inflamed tumors, and the associations of bacteria and tumor expression are context and species specific.
Asunto(s)
Neoplasias Colorrectales , Humanos , Neoplasias Colorrectales/genética , Fusobacterium/genética , Inestabilidad de Microsatélites , TranscriptomaRESUMEN
Conservation is often used to define essential sequences within RNA sites. However, conservation finds only invariant sequence elements that are necessary for function, rather than finding a set of sequence elements sufficient for function. Biochemical studies in several systems-including the hammerhead ribozyme and the purine riboswitch-find additional elements, such as loop-loop interactions, required for function yet not phylogenetically conserved. Here we define a critical test of sufficiency: We embed a minimal, apparently sufficient motif for binding the amino acid tryptophan in a random-sequence background and ask whether we obtain functional molecules. After a negative result, we use a combination of three-dimensional structural modeling, selection, designed mutations, high-throughput sequencing, and bioinformatics to explore functional insufficiency. This reveals an essential unpaired G in a diverse structural context, varied sequence, and flexible distance from the invariant internal loop binding site identified previously. Addition of the new element yields a sufficient binding site by the insertion criterion, binding tryptophan in 22 out of 23 tries. Random insertion testing for site sufficiency seems likely to be broadly revealing.
Asunto(s)
ARN/química , ARN/metabolismo , Triptófano/metabolismo , Aptámeros de Nucleótidos/química , Aptámeros de Nucleótidos/genética , Aptámeros de Nucleótidos/metabolismo , Secuencia de Bases , Sitios de Unión/genética , Biología Computacional , Secuencia Conservada , Evolución Molecular , Modelos Moleculares , Simulación de Dinámica Molecular , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , ARN/genética , Técnica SELEX de Producción de Aptámeros , Homología de Secuencia de Ácido NucleicoRESUMEN
Genome-wide association studies (GWAS) have identified many common variant loci associated with asthma susceptibility, but few studies investigate the genetics underlying moderate-to-severe asthma risk. Here, we present a whole-genome sequencing study comparing 3181 moderate-to-severe asthma patients to 3590 non-asthma controls. We demonstrate that asthma risk is genetically correlated with lung function measures and that this component of asthma risk is orthogonal to the eosinophil genetics that also contribute to disease susceptibility. We find that polygenic scores for reduced lung function are associated with younger asthma age of onset. Genome-wide, seven previously reported common asthma variant loci and one previously reported lung function locus, near THSD4, reach significance. We replicate association of the lung function locus in a recently published GWAS of moderate-to-severe asthma patients. We additionally replicate the association of a previously reported rare (minor allele frequency < 1%) coding variant in IL33 and show significant enrichment of rare variant burden in genes from common variant allergic disease loci. Our findings highlight the contribution of lung function genetics to moderate-to-severe asthma risk, and provide initial rare variant support for associations with moderate-to-severe asthma risk at several candidate genes from common variant loci.
Asunto(s)
Asma , Estudio de Asociación del Genoma Completo , Asma/genética , Predisposición Genética a la Enfermedad , Humanos , Pulmón , Secuenciación Completa del GenomaRESUMEN
Programmed -1 ribosomal frameshift (-1 PRF) allows for alternative reading frames within one mRNA. First found in several viruses, it is now believed to exist in all kingdoms of life. Strong stimulators for -1 PRF are a heptameric slippery site and an RNA pseudoknot. Here, we present a new algorithm KnotInFrame, for the automatic detection of -1 PRF signals from genomic sequences. It finds the frameshifting stimulators by means of a specialized RNA-pseudoknot folding program, fast enough for genome-wide analyses. Evaluations on known -1 PRF signals demonstrate a high sensitivity.
Asunto(s)
Algoritmos , Sistema de Lectura Ribosómico , Programas Informáticos , Secuencia de Bases , Biología Computacional , Secuencia de Consenso , Bases de Datos de Ácidos Nucleicos , Genómica , Saccharomyces cerevisiae/genéticaRESUMEN
RNA pseudoknots are an important structural feature of RNAs, but often neglected in computer predictions for reasons of efficiency. Here, we present the pknotsRG Web Server for single sequence RNA secondary structure prediction including pseudoknots. pknotsRG employs the newest Turner energy rules for finding the structure of minimal free energy. The algorithm has been improved in several ways recently. First, it has been reimplemented in the C programming language, resulting in a 60-fold increase in speed. Second, all suboptimal foldings up to a user-defined threshold can be enumerated. For large scale analysis, a fast sliding window mode is available. Further improvements of the Web Server are a new output visualization using the PseudoViewer Web Service or RNAmovies for a movie like animation of several suboptimal foldings. The tool is available as source code, binary executable, online tool or as Web Service. The latter alternative allows for an easy integration into bio-informatics pipelines. pknotsRG is available at the Bielefeld Bioinformatics Server (http://bibiserv.techfak.uni-bielefeld.de/pknotsrg).
Asunto(s)
Conformación de Ácido Nucleico , ARN/química , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Gráficos por Computador , Simulación por Computador , Internet , Modelos Estadísticos , Lenguajes de Programación , Interfaz Usuario-ComputadorRESUMEN
MOTIVATION: While large-scale whole genome sequencing is feasible the high costs compel investigators to focus on disease subjects. As a result large sequencing datasets of samples with different diseases are often readily available, but not healthy controls to contrast them with. While it is possible to perform an association study using only diseases, the associations could be driven by a disease acting as a control and not the focal disease. METHODS: We developed a genotype-on-phenotype reverse regression with a Bayesian spike and slab prior to enable association testing in datasets with multiple diseases. This method, referred to as revreg, flagged associations (both common and rare) that were driven by diseases that were not of primary interest. RESULTS: Based on simulations, revreg had 80% power to detect an odds ratio of 1.74 for common variants (3500 samples total) and 3.73 for rare variants (14,000 samples total), with minimal type I error. For common variants, we tested this method on 3657 whole genome sequenced samples aimed at discovering variants associated with disease risk of Chronic Obstructive Pulmonary Disease using three other diseases as controls. We demonstrated detection of six highly significant associations likely due to Age-Related Macular Degeneration. In an exome dataset of 8836 samples aimed at characterizing rare variants associated with disease risk of Asthma, using five other diseases as controls, we detected and removed genic regions due to AMD (C3, CFH, CFHR5, CFI, and DNMT3A) and RA (KRTAP13-4).
Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Análisis de Secuencia de ADN/métodos , Secuenciación Completa del Genoma/métodos , Asma/genética , Teorema de Bayes , Estudios de Casos y Controles , Simulación por Computador , Predisposición Genética a la Enfermedad , Humanos , Degeneración Macular/genética , FenotipoRESUMEN
BACKGROUND: Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice. RESULTS: We present a new filtering approach, which exploits the family specific secondary structure and significantly reduces the number of CM searches. The filter eliminates approximately 85% of the queries and discards only 2.6% true positives when evaluating Rfam against itself. First results also capture previously undetected non-coding RNAs in a recent human RNAz screen. CONCLUSION: The RNA shape index filter (RNAsifter) is based on the following rationale: An RNA family is characterised by structure, much more succinctly than by sequence content. Structures of individual family members, which naturally have different length and sequence composition, may exhibit structural variation in detail, but overall, they have a common shape in a more abstract sense. Given a fixed release of the Rfam data base, we can compute these abstract shapes for all families. This is called a shape index. If a query sequence belongs to a certain family, it must be able to fold into the family shape with reasonable free energy. Therefore, rather than matching the query against all families in the data base, we can first (and quickly) compute its feasible shape(s), and use the shape index to access only those families where a good match is possible due to a common shape with the query.
Asunto(s)
Algoritmos , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Almacenamiento y Recuperación de la Información/métodos , ARN/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Secuencia de Bases , Datos de Secuencia MolecularRESUMEN
MOTIVATION AND RESULTS: Motivated by the recent rise of interest in small regulatory RNAs, we present Locomotif--a new approach for locating RNA motifs that goes beyond the previous ones in three ways: (1) motif search is based on efficient dynamic programming algorithms, incorporating the established thermodynamic model of RNA secondary structure formation. (2) motifs are described graphically, using a Java-based editor, and search algorithms are derived from the graphics in a fully automatic way. The editor allows us to draw secondary structures, annotated with size and sequence information. They closely resemble the established, but informal way in which RNA motifs are communicated in the literature. Thus, the learning effort for Locomotif users is minimal. (3) Locomotif employs a client-server approach. Motifs are designed by the user locally. Search programs are generated and compiled on a bioinformatics server. They are made available both for execution on the server, and for download as C source code plus an appropriate makefile. AVAILABILITY: Locomotif is available at http://bibiserv.techfak.uni-bielefeld.de/locomotif.
Asunto(s)
Algoritmos , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Almacenamiento y Recuperación de la Información/métodos , ARN/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Secuencia de Bases , Datos de Secuencia Molecular , Alineación de Secuencia/métodos , Interfaz Usuario-ComputadorRESUMEN
BACKGROUND: Idiopathic pulmonary fibrosis (IPF) risk has a strong genetic component. Studies have implicated variations at several loci, including TERT, surfactant genes, and a single nucleotide polymorphism at chr11p15 (rs35705950) in the intergenic region between TOLLIP and MUC5B. Patients with IPF who have risk alleles at rs35705950 have longer survival from the time of IPF diagnosis than do patients homozygous for the non-risk allele, whereas patients with shorter telomeres have shorter survival times. We aimed to assess whether rare protein-altering variants in genes regulating telomere length are enriched in patients with IPF homozygous for the non-risk alleles at rs35705950. METHODS: Between Nov 1, 2014, and Nov 1, 2016, we assessed blood samples from patients aged 40 years or older and of European ancestry with sporadic IPF from three international phase 3 clinical trials (INSPIRE, CAPACITY, ASCEND), one phase 2 study (RIFF), and US-based observational studies (Vanderbilt Clinical Interstitial Lung Disease Registry and the UCSF Interstitial Lung Disease Clinic registry cohorts) at the Broad Institute (Cambridge, MA, USA) and Human Longevity (San Diego, CA, USA). We also assessed blood samples from non-IPF controls in several clinical trials. We did whole-genome sequencing to assess telomere length and identify rare protein-altering variants, stratified by rs35705950 genotype. We also assessed rare functional variation in TERT exons and compared telomere length and disease progression across genotypes. FINDINGS: We assessed samples from 1510 patients with IPF and 1874 non-IPF controls. 30 (3%) of 1046 patients with an rs35705950 risk allele had a rare protein-altering variant in TERT compared with 34 (7%) of 464 non-risk allele carriers (odds ratio 0·40 [95% CI 0·24-0·66], p=0·00039). Subsequent analyses identified enrichment of rare protein-altering variants in PARN and RTEL1, and rare variation in TERC in patients with IPF compared with controls. We expanded our study population to provide a more accurate estimation of rare variant frequency in these four loci, and to calculate telomere length. The proportion of patients with at least one rare variant in TERT, PARN, TERC, or RTEL1 was higher in patients with IPF than in controls (149 [9%] of 1739 patients vs 205 [2%] of 8645 controls, p=2·44â×â10-8). Patients with IPF who had a variant in any of the four identified telomerase component genes had telomeres that were 3·69-16·10% shorter than patients without a variant in any of the four genes and had an earlier mean age of disease onset than patients without one or more variants (65·1 years [SD 7·8] vs 67·1 years [7·9], p=0·004). In the placebo arms of clinical trials, shorter telomeres were significantly associated with faster disease progression (1·7% predicted forced vital capacity per kb per year, p=0·002). Pirfenidone had treatment benefit regardless of telomere length (p=4·24â×â10-8 for telomere length lower than the median, p=0·0044 for telomere length greater than the median). INTERPRETATION: Rare protein-altering variants in TERT, PARN, TERC, and RTEL1 are enriched in patients with IPF compared with controls, and, in the case of TERT, particularly in individuals without a risk allele at the rs35705950 locus. This suggests that multiple genetic factors contribute to sporadic IPF, which might implicate distinct mechanisms of pathogenesis and disease progression. FUNDING: Genentech, National Institutes of Health, Francis Family Foundation, Pulmonary Fibrosis Foundation, Nina Ireland Program for Lung Health, US Department of Veterans Affairs.
Asunto(s)
Fibrosis Pulmonar Idiopática/sangre , Mucina 5B/sangre , Homeostasis del Telómero/genética , Anciano , Estudios de Casos y Controles , Ensayos Clínicos como Asunto , Femenino , Humanos , Fibrosis Pulmonar Idiopática/genética , Masculino , Persona de Mediana Edad , Secuenciación Completa del GenomaRESUMEN
Computational analysis of RNA secondary structure is a classical field of biosequence analysis, which has recently gained momentum due to the manyfold regulatory functions of RNA that have become apparent. We present five recent computational approaches that address the problems of synoptic folding space analysis, pseudoknot prediction, structure alignment, comparative structure prediction, and miRNA target prediction. All these programs are in current use and are available via the Bielefeld Bioinformatics Server at .
Asunto(s)
Biología Computacional/tendencias , ARN/química , Animales , Secuencia de Bases , Humanos , Internet , Modelos Moleculares , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Homología de Secuencia de Ácido Nucleico , Programas InformáticosRESUMEN
The programs GMAP and GSNAP, for aligning RNA-Seq and DNA-Seq datasets to genomes, have evolved along with advances in biological methodology to handle longer reads, larger volumes of data, and new types of biological assays. The genomic representation has been improved to include linear genomes that can compare sequences using single-instruction multiple-data (SIMD) instructions, compressed genomic hash tables with fast access using SIMD instructions, handling of large genomes with more than four billion bp, and enhanced suffix arrays (ESAs) with novel data structures for fast access. Improvements to the algorithms have included a greedy match-and-extend algorithm using suffix arrays, segment chaining using genomic hash tables, diagonalization using segmental hash tables, and nucleotide-level dynamic programming procedures that use SIMD instructions and eliminate the need for F-loop calculations. Enhancements to the functionality of the programs include standardization of indel positions, handling of ambiguous splicing, clipping and merging of overlapping paired-end reads, and alignments to circular chromosomes and alternate scaffolds. The programs have been adapted for use in pipelines by integrating their usage into R/Bioconductor packages such as gmapR and HTSeqGenie, and these pipelines have facilitated the discovery of numerous biological phenomena.