RESUMO
BACKGROUND: Gene families are groups of homologous genes that often have similar biological functions. These families are formed by gene duplication events throughout evolution, resulting in multiple copies of an ancestral gene. Over time, these copies can acquire mutations and structural variations, resulting in members that may vary in size, motif ordering and sequence. Multigene families have been described in a broad range of organisms, from single-celled bacteria to complex multicellular organisms, and have been linked to an array of phenomena, such as host-pathogen interactions, immune evasion and embryonic development. Despite the importance of gene families, few approaches have been developed for estimating and graphically visualizing their diversity patterns and expression profiles in genome-wide studies. RESULTS: Here, we introduce an R package named dgfr, which estimates and enables the visualization of sequence divergence within gene families, as well as the visualization of secondary data such as gene expression. The package takes as input a multi-fasta file containing the coding sequences (CDS) or amino acid sequences from a multigene family, performs a pairwise alignment among all sequences, and estimates their distance, which is subjected to dimension reduction, optimal cluster determination, and gene assignment to each cluster. The result is a dataset that allows for the visualization of sequence divergence and expression within the gene family, an approximation of the number of clusters present in the family. CONCLUSIONS: dgfr provides a way to estimate and study the diversity of gene families, as well as visualize the dispersion and secondary profile of the sequences. The dgfr package is available at https://github.com/lailaviana/dgfr under the GPL-3 license.
Assuntos
Variação Genética , Família Multigênica , Software , Variação Genética/genética , Alinhamento de Sequência/métodosRESUMO
BACKGROUND: Trypanosoma cruzi, the etiologic agent of Chagas disease, is currently divided into six discrete typing units (DTUs), named TcI-TcVI. TcII is among the major DTUs enrolled in human infections in South America southern cone, where it is associated with severe cardiac and digestive symptoms. Despite the importance of TcII in Chagas disease epidemiology and pathology, so far, no genome-wide comparisons of the mitochondrial and nuclear genomes of TcII field isolates have been performed to track the variability and evolution of this DTU in endemic regions. RESULTS: In the present work, we have sequenced and compared the whole nuclear and mitochondrial genomes of seven TcII strains isolated from chagasic patients from the central and northeastern regions of Minas Gerais, Brazil, revealing an extensive genetic variability within this DTU. A comparison of the phylogeny based on the nuclear or mitochondrial genomes revealed that the majority of branches were shared by both sequences. The subtle divergences in the branches are probably consequence of mitochondrial introgression events between TcII strains. Two T. cruzi strains isolated from patients living in the central region of Minas Gerais, S15 and S162a, were clustered in the nuclear and mitochondrial phylogeny analysis. These two strains were isolated from the other five by the Espinhaço Mountains, a geographic barrier that could have restricted the traffic of insect vectors during T. cruzi evolution in the Minas Gerais state. Finally, the presence of aneuploidies was evaluated, revealing that all seven TcII strains have a different pattern of chromosomal duplication/loss. CONCLUSIONS: Analysis of genomic variability and aneuploidies suggests that there is significant genomic variability within Minas Gerais TcII strains, which could be exploited by the parasite to allow rapid selection of favorable phenotypes. Also, the aneuploidy patterns vary among T. cruzi strains and does not correlate with the nuclear phylogeny, suggesting that chromosomal duplication/loss are recent and frequent events in the parasite evolution.
Assuntos
Aneuploidia , Doença de Chagas/parasitologia , Variação Genética , Genoma de Protozoário , Proteínas de Protozoários/genética , Trypanosoma cruzi/genética , Sequenciamento Completo do Genoma/métodos , Animais , Doença de Chagas/transmissão , DNA de Protozoário/genética , Genótipo , Humanos , Insetos Vetores/parasitologia , Tipagem Molecular , Filogenia , Trypanosoma cruzi/classificação , Trypanosoma cruzi/isolamento & purificaçãoRESUMO
Repetitive elements cause assembly fragmentation in complex eukaryotic genomes, limiting the study of their variability. The genome of Trypanosoma cruzi, the parasite that causes Chagas disease, has a high repetitive content, including multigene families. Although many T. cruzi multigene families encode surface proteins that play pivotal roles in host-parasite interactions, their variability is currently underestimated, as their high repetitive content results in collapsed gene variants. To estimate sequence variability and copy number variation of multigene families, we developed a read-based approach that is independent of gene-specific read mapping and de novo assembly. This methodology was used to estimate the copy number and variability of MASP, TcMUC, and Trans-Sialidase (TS), the three largest T. cruzi multigene families, in 36 strains, including members of all six parasite discrete typing units (DTUs). We found that these three families present a specific pattern of variability and copy number among the distinct parasite DTUs. Inter-DTU hybrid strains presented a higher variability of these families, suggesting that maintaining a larger content of their members could be advantageous. In addition, in a chronic murine model and chronic Chagasic human patients, the immune response was focused on TS antigens, suggesting that targeting TS conserved sequences could be a potential avenue to improve diagnosis and vaccine design against Chagas disease. Finally, the proposed approach can be applied to study multicopy genes in any organism, opening new avenues to access sequence variability in complex genomes. IMPORTANCE Sequences that have several copies in a genome, such as multicopy-gene families, mobile elements, and microsatellites, are among the most challenging genomic segments to study. They are frequently underestimated in genome assemblies, hampering the correct assessment of these important players in genome evolution and adaptation. Here, we developed a new methodology to estimate variability and copy numbers of repetitive genomic regions and employed it to characterize the T. cruzi multigene families MASP, TcMUC, and transsialidase (TS), which are important virulence factors in this parasite. We showed that multigene families vary in sequence and content among the parasite's lineages, whereas hybrid strains have a higher sequence variability that could be advantageous to the parasite's survivability. By identifying conserved sequences within multigene families, we showed that the mammalian host immune response toward these multigene families is usually focused on the TS multigene family. These TS conserved and immunogenic peptides can be explored in future works as diagnostic targets or vaccine candidates for Chagas disease. Finally, this methodology can be easily applied to any organism of interest, which will aid in our understanding of complex genomic regions.
Assuntos
Doença de Chagas , Trypanosoma cruzi , Humanos , Animais , Camundongos , Trypanosoma cruzi/genética , Variações do Número de Cópias de DNA , Genoma de Protozoário , Serina Proteases Associadas a Proteína de Ligação a Manose/genética , Família Multigênica , Doença de Chagas/parasitologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mamíferos/genéticaRESUMO
Leishmaniasis encompasses a group of diverse clinical diseases caused by protozoan parasites of the Leishmania genus. This disease is a major public health problem in the New World affecting people exposed in endemic regions. The city of Governador Valadares (Minas Gerais/Brazil) is a re-emerging area for visceral leishmaniasis, with 191 human cases reported from 2008 to 2017 and a lethality rate of 14.7%. The transmission of the parasite occurs intensely in this region with up to 22% of domestic dogs with positive serology for the visceral form. Lu. longipalpis is one of the most abundant sand fly species in this area. Despite this scenario, so far there is no information regarding the circulating Leishmania species in the insect vector Lutzomyia longipalpis in this focus. We collected 616 female Lutzomyia longipalpis sand flies between January and September 2015 in the Vila Parque Ibituruna neighborhood (Governador Valadares/MG), which is located on a transitional area between the sylvatic and urban environments with residences built near a preserved area. After DNA extraction of individual sand flies, the natural Leishmania infections in Lu. longipalpis were detected by conventional PCR, using primers derived from kDNA sequences, specific for L. (Leishmania) or L. (Viannia) subgenus. The sensitivity of these PCR reactions was 0.1 pg of DNA for each Leishmania subgenus and the total infection rate of 16.2% (100 positive specimens). Species-specific PCR detected the presence of multiple Leishmania species in infected Lu. longipalpis specimens in Governador Valadares, including L. amazonensis (n = 3), L. infantum (n = 28), L. (Viannia) spp. (n = 20), coinfections with L. infantum and L. (Viannia) spp. (n = 5), and L. (Leishmania) spp (n = 44). Our results demonstrate that multiple Leishmania species circulate in Lu. longipalpis in Governador Valadares and reveal a potential increasing risk of transmission of the different circulating parasite species. This information reinforces the need for epidemiological and entomological surveillance in this endemic focus, and the development of effective control strategies against leishmaniasis.
Assuntos
Insetos Vetores/parasitologia , Leishmania/classificação , Leishmania/crescimento & desenvolvimento , Psychodidae/parasitologia , Animais , Brasil/epidemiologia , Humanos , Leishmaniose/epidemiologia , Leishmaniose/genética , Leishmaniose/transmissão , Reação em Cadeia da Polimerase , Reforma UrbanaRESUMO
Although aneuploidy usually results in severe abnormalities in multicellular eukaryotes, recent data suggest that it could be beneficial for unicellular eukaryotes, such as yeast and trypanosomatid parasites, providing increased survival under stressful conditions. Among characterized trypanosomatids, Trypanosoma cruzi, Trypanosoma brucei and species from the genus Leishmania stand out due to their importance in public health, infecting around 20 million people worldwide. The presence of aneuploidies in T. cruzi and Leishmania was recently confirmed by analysis based on next generation sequencing (NGS) and fluorescence in situ hybridization, where they have been associated with adaptation during transmission between their insect vectors and mammalian hosts and in promoting drug resistance. Although chromosomal copy number variations (CCNVs) are present in the aforementioned species, PFGE and fluorescence cytophotometry analyses suggest that aneuploidies are absent from T. brucei. A re-evaluation of CCNV in T. b gambiense based on NGS reads confirmed the absence of aneuploidies in this subspecies. However, the presence of aneuploidies in the other two T. brucei subspecies, T. b. brucei and T. b. rhodesiense, has not been evaluated using NGS approaches. In the present work, we tested for aneuploidies in 26 T. brucei isolates, including samples from the three T. brucei subspecies, by both allele frequency and read depth coverage analyses. These analyses showed that none of the T. brucei subspecies presents aneuploidies, which could be related to differences in the mechanisms of DNA replication and recombination in these parasites when compared with Leishmania.