Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Genet ; 20(4): e1011184, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38683871

RESUMO

By decomposing genome sequences into k-mers, it is possible to estimate genome differences without alignment. Techniques such as k-mer minimisers, for example MinHash, have been developed and are often accurate approximations of distances based on full k-mer sets. These and other alignment-free methods avoid the large temporal and computational expense of alignment. However, these k-mer set comparisons are not entirely accurate within-species and can be completely inaccurate within-lineage. This is due, in part, to their inability to distinguish core polymorphism from accessory differences. Here we present a new approach, KmerAperture, which uses information on the k-mer relative genomic positions to determine the type of polymorphism causing differences in k-mer presence and absence between pairs of genomes. Single SNPs are expected to result in k unique contiguous k-mers per genome. On the other hand, contiguous series > k may be caused by accessory differences of length S-k+1; when the start and end of the sequence are contiguous with homologous sequence. Alternatively, they may be caused by multiple SNPs within k bp from each other and KmerAperture can determine whether that is the case. To demonstrate use cases KmerAperture was benchmarked using datasets including a very low diversity simulated population with accessory content independent from the number of SNPs, a simulated population where SNPs are spatially dense, a moderately diverse real cluster of genomes (Escherichia coli ST1193) with a large accessory genome and a low diversity real genome cluster (Salmonella Typhimurium ST34). We show that KmerAperture can accurately distinguish both core and accessory sequence diversity without alignment, outperforming other k-mer based tools.


Assuntos
Genoma Bacteriano , Polimorfismo de Nucleotídeo Único , Polimorfismo de Nucleotídeo Único/genética , Sintenia , Genômica/métodos , Algoritmos , Escherichia coli/genética , Software , Alinhamento de Sequência/métodos , Filogenia
2.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38168711

RESUMO

In recent times, pathogen genome sequencing has become increasingly used to investigate infectious disease outbreaks. When genomic data is sampled densely enough amongst infected individuals, it can help resolve who infected whom. However, transmission analysis cannot rely solely on a phylogeny of the genomes but must account for the within-host evolution of the pathogen, which blurs the relationship between phylogenetic and transmission trees. When only a single genome is sampled for each host, the uncertainty about who infected whom can be quite high. Consequently, transmission analysis based on multiple genomes of the same pathogen per host has a clear potential for delivering more precise results, even though it is more laborious to achieve. Here, we present a new methodology that can use any number of genomes sampled from a set of individuals to reconstruct their transmission network. Furthermore, we remove the need for the assumption of a complete transmission bottleneck. We use simulated data to show that our method becomes more accurate as more genomes per host are provided, and that it can infer key infectious disease parameters such as the size of the transmission bottleneck, within-host growth rate, basic reproduction number, and sampling fraction. We demonstrate the usefulness of our method in applications to real datasets from an outbreak of Pseudomonas aeruginosa amongst cystic fibrosis patients and a nosocomial outbreak of Klebsiella pneumoniae.


Assuntos
Doenças Transmissíveis , Humanos , Filogenia , Doenças Transmissíveis/genética , Doenças Transmissíveis/epidemiologia , Surtos de Doenças , Genômica , Mapeamento Cromossômico , Transmissão de Doença Infecciosa
3.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36440957

RESUMO

MOTIVATION: The ability to distinguish imported cases from locally acquired cases has important consequences for the selection of public health control strategies. Genomic data can be useful for this, for example, using a phylogeographic analysis in which genomic data from multiple locations are compared to determine likely migration events between locations. However, these methods typically require good samples of genomes from all locations, which is rarely available. RESULTS: Here, we propose an alternative approach that only uses genomic data from a location of interest. By comparing each new case with previous cases from the same location, we are able to detect imported cases, as they have a different genealogical distribution than that of locally acquired cases. We show that, when variations in the size of the local population are accounted for, our method has good sensitivity and excellent specificity for the detection of imports. We applied our method to data simulated under the structured coalescent model and demonstrate relatively good performance even when the local population has the same size as the external population. Finally, we applied our method to several recent genomic datasets from both bacterial and viral pathogens, and show that it can, in a matter of seconds or minutes, deliver important insights on the number of imports to a geographically limited sample of a pathogen population. AVAILABILITY AND IMPLEMENTATION: The R package DetectImports is freely available from https://github.com/xavierdidelot/DetectImports. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Doenças Transmissíveis , Software , Humanos , Genômica/métodos , Genoma , Filogeografia , Doenças Transmissíveis/diagnóstico , Doenças Transmissíveis/genética
4.
J Virol ; 95(19): e0101221, 2021 09 09.
Artigo em Inglês | MEDLINE | ID: mdl-34260287

RESUMO

Vaccinia virus produces two types of virions known as single-membraned intracellular mature virus (MV) and double-membraned extracellular enveloped virus (EV). EV production peaks earlier when initial MVs are further wrapped and secreted to spread infection within the host. However, late during infection, MVs accumulate intracellularly and become important for host-to-host transmission. The process that regulates this switch remains elusive and is thought to be influenced by host factors. Here, we examined the hypothesis that EV and MV production are regulated by the virus through expression of F13 and the MV-specific protein A26. By switching the promoters and altering the expression kinetics of F13 and A26, we demonstrate that A26 expression downregulates EV production and plaque size, thus limiting viral spread. This process correlates with A26 association with the MV surface protein A27 and exclusion of F13, thus reducing EV titers. Thus, MV maturation is controlled by the abundance of the viral A26 protein, independently of other factors, and is rate limiting for EV production. The A26 gene is conserved within vertebrate poxviruses but is strikingly lost in poxviruses known to be transmitted exclusively by biting arthropods. A26-mediated virus maturation thus has the appearance to be an ancient evolutionary adaptation to enhance transmission of poxviruses that has subsequently been lost from vector-adapted species, for which it may serve as a genetic signature. The existence of virus-regulated mechanisms to produce virions adapted to fulfill different functions represents a novel level of complexity in mammalian viruses with major impacts on evolution, adaptation, and transmission. IMPORTANCE Chordopoxviruses are mammalian viruses that uniquely produce a first type of virion adapted to spread within the host and a second type that enhances transmission between hosts, which can take place by multiple ways, including direct contact, respiratory droplets, oral/fecal routes, or via vectors. Both virion types are important to balance intrahost dissemination and interhost transmission, so virus maturation pathways must be tightly controlled. Here, we provide evidence that the abundance and kinetics of expression of the viral protein A26 regulates this process by preventing formation of the first form and shifting maturation toward the second form. A26 is expressed late after the initial wave of progeny virions is produced, so sufficient viral dissemination is ensured, and A26 provides virions with enhanced environmental stability. Conservation of A26 in all vertebrate poxviruses, but not in those transmitted exclusively via biting arthropods, reveals the importance of A26-controlled virus maturation for transmission routes involving environmental exposure.


Assuntos
Regiões Promotoras Genéticas , Vaccinia virus/fisiologia , Proteínas Virais/metabolismo , Animais , Linhagem Celular , Chordopoxvirinae/genética , Chordopoxvirinae/metabolismo , Engenharia Genética , Humanos , Orthopoxvirus/genética , Orthopoxvirus/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA Viral/genética , RNA Viral/metabolismo , Vaccinia virus/genética , Ensaio de Placa Viral , Proteínas Virais/genética
6.
PLoS Pathog ; 16(1): e1008235, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31905219

RESUMO

Although recombination is known to occur in foot-and-mouth disease virus (FMDV), it is considered only a minor determinant of virus sequence diversity. Analysis at phylogenetic scales shows inter-serotypic recombination events are rare, whereby recombination occurs almost exclusively in non-structural proteins. In this study we have estimated recombination rates within a natural host in an experimental setting. African buffaloes were inoculated with a SAT-1 FMDV strain containing two major viral sub-populations differing in their capsid sequence. This population structure enabled the detection of extensive within-host recombination in the genomic region coding for structural proteins and allowed recombination rates between the two sub-populations to be estimated. Quite surprisingly, the effective recombination rate in VP1 during the acute infection phase turns out to be about 0.1 per base per year, i.e. comparable to the mutation/substitution rate. Using a high-resolution map of effective within-host recombination in the capsid-coding region, we identified a linkage disequilibrium pattern in VP1 that is consistent with a mosaic structure with two main genetic blocks. Positive epistatic interactions between co-evolved variants appear to be present both within and between blocks. These interactions are due to intra-host selection both at the RNA and protein level. Overall our findings show that during FMDV co-infections by closely related strains, capsid-coding genes recombine within the host at a much higher rate than expected, despite the presence of strong constraints dictated by the capsid structure. Although these intra-host results are not immediately translatable to a phylogenetic setting, recombination and epistasis must play a major and so far underappreciated role in the molecular evolution of the virus at all scales.


Assuntos
Proteínas do Capsídeo/genética , Doenças dos Bovinos/virologia , Epistasia Genética , Vírus da Febre Aftosa/genética , Febre Aftosa/virologia , Animais , Búfalos , Capsídeo/metabolismo , Proteínas do Capsídeo/metabolismo , Bovinos , Evolução Molecular , Vírus da Febre Aftosa/metabolismo , Genoma Viral , Filogenia , RNA Viral/genética , Recombinação Genética
7.
Plant J ; 101(2): 455-472, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31529539

RESUMO

We sequenced the genome of the highly heterozygous almond Prunus dulcis cv. Texas combining short- and long-read sequencing. We obtained a genome assembly totaling 227.6 Mb of the estimated almond genome size of 238 Mb, of which 91% is anchored to eight pseudomolecules corresponding to its haploid chromosome complement, and annotated 27 969 protein-coding genes and 6747 non-coding transcripts. By phylogenomic comparison with the genomes of 16 additional close and distant species we estimated that almond and peach (Prunus persica) diverged around 5.88 million years ago. These two genomes are highly syntenic and show a high degree of sequence conservation (20 nucleotide substitutions per kb). However, they also exhibit a high number of presence/absence variants, many attributable to the movement of transposable elements (TEs). Transposable elements have generated an important number of presence/absence variants between almond and peach, and we show that the recent history of TE movement seems markedly different between them. Transposable elements may also be at the origin of important phenotypic differences between both species, and in particular for the sweet kernel phenotype, a key agronomic and domestication character for almond. Here we show that in sweet almond cultivars, highly methylated TE insertions surround a gene involved in the biosynthesis of amygdalin, whose reduced expression has been correlated with the sweet almond phenotype. Altogether, our results suggest a key role of TEs in the recent history and diversification of almond and its close relative peach.


Assuntos
Sequência de Bases , Elementos de DNA Transponíveis/genética , Genoma de Planta , Prunus dulcis/genética , Prunus persica/genética , Mapeamento Cromossômico , Metilação de DNA , Domesticação , Evolução Molecular , Genes de Plantas/genética , Filogenia , Sementes , Especificidade da Espécie
8.
BMC Genomics ; 22(1): 159, 2021 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-33676404

RESUMO

BACKGROUND: Chlamydia abortus and Chlamydia psittaci are important pathogens of livestock and avian species, respectively. While C. abortus is recognized as descended from C. psittaci species, there is emerging evidence of strains that are intermediary between the two species, suggesting they are recent evolutionary ancestors of C. abortus. Such strains include C. psittaci strain 84/2334 that was isolated from a parrot. Our aim was to classify this strain by sequencing its genome and explore its evolutionary relationship to both C. abortus and C. psittaci. RESULTS: In this study, methods based on multi-locus sequence typing (MLST) of seven housekeeping genes and on typing of five species discriminant proteins showed that strain 84/2334 clustered with C. abortus species. Furthermore, whole genome de novo sequencing of the strain revealed greater similarity to C. abortus in terms of GC content, while 16S rRNA and whole genome phylogenetic analysis, as well as network and recombination analysis showed that the strain clusters more closely with C. abortus strains. The analysis also suggested a closer evolutionary relationship between this strain and the major C. abortus clade, than to two other intermediary avian C. abortus strains or C. psittaci strains. Molecular analyses of genes (polymorphic membrane protein and transmembrane head protein genes) and loci (plasticity zone), found in key virulence-associated regions that exhibit greatest diversity within and between chlamydial species, reveal greater diversity than present in sequenced C. abortus genomes as well as similar features to both C. abortus and C. psittaci species. The strain also possesses an extrachromosomal plasmid, as found in most C. psittaci species but absent from all sequenced classical C. abortus strains. CONCLUSION: Overall, the results show that C. psittaci strain 84/2334 clusters very closely with C. abortus strains, and are consistent with the strain being a recent C. abortus ancestral species. This suggests that the strain should be reclassified as C. abortus. Furthermore, the identification of a C. abortus strain bearing an extra-chromosomal plasmid has implications for plasmid-based transformation studies to investigate gene function as well as providing a potential route for the development of a next generation vaccine to protect livestock from C. abortus infection.


Assuntos
Infecções por Chlamydia , Chlamydia , Chlamydophila psittaci , Animais , Chlamydia/genética , Chlamydophila psittaci/genética , Genômica , Tipagem de Sequências Multilocus , Filogenia , RNA Ribossômico 16S/genética
9.
J Virol ; 93(15)2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-31092573

RESUMO

African buffaloes (Syncerus caffer) are the principal "carrier" hosts of foot-and-mouth disease virus (FMDV). Currently, the epithelia and lymphoid germinal centers of the oropharynx have been identified as sites for FMDV persistence. We carried out studies in FMDV SAT1 persistently infected buffaloes to characterize the diversity of viruses in oropharyngeal epithelia, germinal centers, probang samples (oropharyngeal scrapings), and tonsil swabs to determine if sufficient virus variation is generated during persistence for immune escape. Most sequencing reads of the VP1 coding region of the SAT1 virus inoculum clustered around 2 subpopulations differing by 22 single-nucleotide variants of intermediate frequency. Similarly, most sequences from oropharynx tissue clustered into two subpopulations, albeit with different proportions, depending on the day postinfection (dpi). There was a significant difference between the populations of viruses in the inoculum and in lymphoid tissue taken at 35 dpi. Thereafter, until 400 dpi, no significant variation was detected in the viral populations in samples from individual animals, germinal centers, and epithelial tissues. Deep sequencing of virus from probang or tonsil swab samples harvested prior to postmortem showed less within-sample variability of VP1 than that of tissue sample sequences analyzed at the same time. Importantly, there was no significant difference in the ability of sera collected between 14 and 400 dpi to neutralize the inoculum or viruses isolated at later time points in the study from the same animal. Therefore, based on this study, there is no evidence of escape from antibody neutralization contributing to FMDV persistent infection in African buffalo.IMPORTANCE Foot-and-mouth disease virus (FMDV) is a highly contagious virus of cloven-hoofed animals and is recognized as the most important constraint to international trade in animals and animal products. African buffaloes (Syncerus caffer) are efficient carriers of FMDV, and it has been proposed that new virus variants are produced in buffalo during the prolonged carriage after acute infection, which may spread to cause disease in livestock populations. Here, we show that despite an accumulation of low-frequency sequence variants over time, there is no evidence of significant antigenic variation leading to immune escape. Therefore, carrier buffalo are unlikely to be a major source of new virus variants.


Assuntos
Búfalos , Portador Sadio/veterinária , Evolução Molecular , Vírus da Febre Aftosa/crescimento & desenvolvimento , Febre Aftosa/imunologia , Febre Aftosa/virologia , Evasão da Resposta Imune , Animais , Proteínas do Capsídeo/genética , Portador Sadio/imunologia , Portador Sadio/virologia , Epitélio/virologia , Vírus da Febre Aftosa/genética , Vírus da Febre Aftosa/imunologia , Instabilidade Genômica , Centro Germinativo/virologia , Mutação , Orofaringe/virologia , Análise de Sequência de DNA
10.
Emerg Infect Dis ; 25(6): 1169-1176, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31107235

RESUMO

In 2015, a mass die-off of ≈200,000 saiga antelopes in central Kazakhstan was caused by hemorrhagic septicemia attributable to the bacterium Pasteurella multocida serotype B. Previous analyses have indicated that environmental triggers associated with weather conditions, specifically air moisture and temperature in the region of the saiga antelope calving during the 10-day period running up to the event, were critical to the proliferation of latent bacteria and were comparable to conditions accompanying historically similar die-offs in the same areas. We investigated whether additional viral or bacterial pathogens could be detected in samples from affected animals using 3 different high-throughput sequencing approaches. We did not identify pathogens associated with commensal bacterial opportunisms in blood, kidney, or lung samples and thus concluded that P. multocida serotype B was the primary cause of the disease.


Assuntos
Doenças dos Animais/mortalidade , Antílopes , Doenças dos Animais/epidemiologia , Doenças dos Animais/história , Doenças dos Animais/microbiologia , Animais , Antílopes/microbiologia , Infecções Bacterianas/veterinária , Feminino , Gammaproteobacteria/classificação , Gammaproteobacteria/genética , Geografia Médica , História do Século XXI , Cazaquistão/epidemiologia , Masculino , Metagenômica , RNA Ribossômico 16S/genética
11.
Nature ; 501(7468): 506-11, 2013 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-24037378

RESUMO

Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Transcriptoma/genética , Alelos , Linhagem Celular Transformada , Éxons/genética , Perfilação da Expressão Gênica , Humanos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , RNA Mensageiro/análise , RNA Mensageiro/genética
12.
Mol Ecol ; 27(22): 4501-4515, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30252177

RESUMO

Colour plays a prominent role in species recognition; therefore, understanding the proximate basis of pigmentation can provide insight into reproductive isolation and speciation. Colour differences between taxa may be the result of regulatory differences or be caused by mutations in coding regions of the expressed genes. To investigate these two alternatives, we studied the pigment composition and the genetic basis of coloration in two divergent dark-eyed junco (Junco hyemalis) subspecies, the slate-coloured and Oregon juncos, which have evolved marked differences in plumage coloration since the Last Glacial Maximum. We used HPLC and light microscopy to investigate pigment composition and deposition in feathers from four body areas. We then used RNA-seq to compare the relative roles of differential gene expression in developing feathers and sequence divergence in transcribed loci under common-garden conditions. Junco feathers differed in eumelanin and pheomelanin content and distribution. Within subspecies, in lighter feathers melanin synthesis genes were downregulated (including PMEL, TYR, TYRP1, OCA2 and MLANA), and ASIP was upregulated. Feathers from different body regions also showed differential expression of HOX and WNT genes. Feathers from the same body regions that differed in colour between the two subspecies showed differential expression of ASIP and three other genes (MFSD12, KCNJ13 and HAND2) associated with pigmentation in other taxa. Sequence variation in the expressed genes was not related to colour differences. Our findings support the hypothesis that differential regulation of a few genes can account for marked differences in coloration, a mechanism that may facilitate the rapid phenotypic diversification of juncos.


Assuntos
Plumas , Melaninas/análise , Pigmentação/genética , Aves Canoras/genética , Animais , Melaninas/biossíntese , Oregon
13.
Nucleic Acids Res ; 44(12): e114, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27131376

RESUMO

The recent super-exponential growth in the amount of sequencing data generated worldwide has put techniques for compressed storage into the focus. Most available solutions, however, are strictly tied to specific bioinformatics formats, sometimes inheriting from them suboptimal design choices; this hinders flexible and effective data sharing. Here, we present CARGO (Compressed ARchiving for GenOmics), a high-level framework to automatically generate software systems optimized for the compressed storage of arbitrary types of large genomic data collections. Straightforward applications of our approach to FASTQ and SAM archives require a few lines of code, produce solutions that match and sometimes outperform specialized format-tailored compressors and scale well to multi-TB datasets. All CARGO software components can be freely downloaded for academic and non-commercial use from http://bio-cargo.sourceforge.net.


Assuntos
Biologia Computacional/métodos , Genoma , Armazenamento e Recuperação da Informação/métodos , Algoritmos , Compressão de Dados/métodos , Genômica , Software
14.
BMC Genomics ; 18(1): 7, 2017 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-28049418

RESUMO

BACKGROUND: Chimeric transcripts are commonly defined as transcripts linking two or more different genes in the genome, and can be explained by various biological mechanisms such as genomic rearrangement, read-through or trans-splicing, but also by technical or biological artefacts. Several studies have shown their importance in cancer, cell pluripotency and motility. Many programs have recently been developed to identify chimeras from Illumina RNA-seq data (mostly fusion genes in cancer). However outputs of different programs on the same dataset can be widely inconsistent, and tend to include many false positives. Other issues relate to simulated datasets restricted to fusion genes, real datasets with limited numbers of validated cases, result inconsistencies between simulated and real datasets, and gene rather than junction level assessment. RESULTS: Here we present ChimPipe, a modular and easy-to-use method to reliably identify fusion genes and transcription-induced chimeras from paired-end Illumina RNA-seq data. We have also produced realistic simulated datasets for three different read lengths, and enhanced two gold-standard cancer datasets by associating exact junction points to validated gene fusions. Benchmarking ChimPipe together with four other state-of-the-art tools on this data showed ChimPipe to be the top program at identifying exact junction coordinates for both kinds of datasets, and the one showing the best trade-off between sensitivity and precision. Applied to 106 ENCODE human RNA-seq datasets, ChimPipe identified 137 high confidence chimeras connecting the protein coding sequence of their parent genes. In subsequent experiments, three out of four predicted chimeras, two of which recurrently expressed in a large majority of the samples, could be validated. Cloning and sequencing of the three cases revealed several new chimeric transcript structures, 3 of which with the potential to encode a chimeric protein for which we hypothesized a new role. Applying ChimPipe to human and mouse ENCODE RNA-seq data led to the identification of 131 recurrent chimeras common to both species, and therefore potentially conserved. CONCLUSIONS: ChimPipe combines discordant paired-end reads and split-reads to detect any kind of chimeras, including those originating from polymerase read-through, and shows an excellent trade-off between sensitivity and precision. The chimeras found by ChimPipe can be validated in-vitro with high accuracy.


Assuntos
Proteínas de Fusão Oncogênica , Recombinação Genética , Software , Transcrição Gênica , Animais , Biologia Computacional/métodos , Simulação por Computador , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Reprodutibilidade dos Testes , Análise de Sequência de RNA
15.
Nat Methods ; 9(12): 1185-8, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23103880

RESUMO

Because of ever-increasing throughput requirements of sequencing data, most existing short-read aligners have been designed to focus on speed at the expense of accuracy. The Genome Multitool (GEM) mapper can leverage string matching by filtration to search the alignment space more efficiently, simultaneously delivering precision (performing fully tunable exhaustive searches that return all existing matches, including gapped ones) and speed (being several times faster than comparable state-of-the-art tools).


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Biologia Computacional/métodos , Genoma , Genômica/métodos , Humanos , Software
16.
Nucleic Acids Res ; 40(20): 10073-83, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22962361

RESUMO

High-throughput sequencing of cDNA libraries constructed from cellular RNA complements (RNA-Seq) naturally provides a digital quantitative measurement for every expressed RNA molecule. Nature, impact and mutual interference of biases in different experimental setups are, however, still poorly understood-mostly due to the lack of data from intermediate protocol steps. We analysed multiple RNA-Seq experiments, involving different sample preparation protocols and sequencing platforms: we broke them down into their common--and currently indispensable--technical components (reverse transcription, fragmentation, adapter ligation, PCR amplification, gel segregation and sequencing), investigating how such different steps influence abundance and distribution of the sequenced reads. For each of those steps, we developed universally applicable models, which can be parameterised by empirical attributes of any experimental protocol. Our models are implemented in a computer simulation pipeline called the Flux Simulator, and we show that read distributions generated by different combinations of these models reproduce well corresponding evidence obtained from the corresponding experimental setups. We further demonstrate that our in silico RNA-Seq provides insights about hidden precursors that determine the final configuration of reads along gene bodies; enhancing or compensatory effects that explain apparently controversial observations can be observed. Moreover, our simulations identify hitherto unreported sources of systematic bias from RNA hydrolysis, a fragmentation technique currently employed by most RNA-Seq protocols.


Assuntos
Simulação por Computador , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Hidrólise , RNA/metabolismo
17.
Microb Genom ; 10(5)2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38717818

RESUMO

Evidence is accumulating in the literature that the horizontal spread of antimicrobial resistance (AMR) genes mediated by bacteriophages and bacteriophage-like plasmid (phage-plasmid) elements is much more common than previously envisioned. For instance, we recently identified and characterized a circular P1-like phage-plasmid harbouring a bla CTX-M-15 gene conferring extended-spectrum beta-lactamase (ESBL) resistance in Salmonella enterica serovar Typhi. As the prevalence and epidemiological relevance of such mechanisms has never been systematically assessed in Enterobacterales, in this study we carried out a follow-up retrospective analysis of UK Salmonella isolates previously sequenced as part of routine surveillance protocols between 2016 and 2021. Using a high-throughput bioinformatics pipeline we screened 47 784 isolates for the presence of the P1 lytic replication gene repL, identifying 226 positive isolates from 25 serovars and demonstrating that phage-plasmid elements are more frequent than previously thought. The affinity for phage-plasmids appears highly serovar-dependent, with several serovars being more likely hosts than others; most of the positive isolates (170/226) belonged to S. Typhimurium ST34 and ST19. The phage-plasmids ranged between 85.8 and 98.2 kb in size, with an average length of 92.1 kb; detailed analysis indicated a high amount of diversity in gene content and genomic architecture. In total, 132 phage-plasmids had the p0111 plasmid replication type, and 94 the IncY type; phylogenetic analysis indicated that both horizontal and vertical gene transmission mechanisms are likely to be involved in phage-plasmid propagation. Finally, phage-plasmids were present in isolates that were resistant and non-resistant to antimicrobials. In addition to providing a first comprehensive view of the presence of phage-plasmids in Salmonella, our work highlights the need for a better surveillance and understanding of phage-plasmids as AMR carriers, especially through their characterization with long-read sequencing.


Assuntos
Plasmídeos , Salmonella enterica , Sorogrupo , Plasmídeos/genética , Salmonella enterica/virologia , Salmonella enterica/genética , Infecções por Salmonella/microbiologia , Bacteriófagos/genética , Bacteriófagos/classificação , Fagos de Salmonella/genética , Fagos de Salmonella/classificação , Humanos , Filogenia , Transferência Genética Horizontal , Estudos Retrospectivos
18.
Viruses ; 16(4)2024 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-38675899

RESUMO

Lumpy skin disease virus (LSDV) is a member of the capripoxvirus (CPPV) genus of the Poxviridae family. LSDV is a rapidly emerging, high-consequence pathogen of cattle, recently spreading from Africa and the Middle East into Europe and Asia. We have sequenced the whole genome of historical LSDV isolates from the Pirbright Institute virus archive, and field isolates from recent disease outbreaks in Sri Lanka, Mongolia, Nigeria and Ethiopia. These genome sequences were compared to published genomes and classified into different subgroups. Two subgroups contained vaccine or vaccine-like samples ("Neethling-like" clade 1.1 and "Kenya-like" subgroup, clade 1.2.2). One subgroup was associated with outbreaks of LSD in the Middle East/Europe (clade 1.2.1) and a previously unreported subgroup originated from cases of LSD in west and central Africa (clade 1.2.3). Isolates were also identified that contained a mix of genes from both wildtype and vaccine samples (vaccine-like recombinants, grouped in clade 2). Whole genome sequencing and analysis of LSDV strains isolated from different regions of Africa, Europe and Asia have provided new knowledge of the drivers of LSDV emergence, and will inform future disease control strategies.


Assuntos
Genoma Viral , Doença Nodular Cutânea , Vírus da Doença Nodular Cutânea , Filogenia , Sequenciamento Completo do Genoma , Vírus da Doença Nodular Cutânea/genética , Vírus da Doença Nodular Cutânea/classificação , Vírus da Doença Nodular Cutânea/isolamento & purificação , Animais , Doença Nodular Cutânea/virologia , Doença Nodular Cutânea/epidemiologia , Bovinos , África Central/epidemiologia , África Ocidental/epidemiologia , Surtos de Doenças
19.
Brief Bioinform ; 12(6): 614-25, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21504986

RESUMO

Next-generation sequencing technologies have opened up an unprecedented opportunity for microbiology by enabling the culture-independent genetic study of complex microbial communities, which were so far largely unknown. The analysis of metagenomic data is challenging: potentially, one is faced with a sample containing a mixture of many different bacterial species, whose genome has not necessarily been sequenced beforehand. In the simpler case of the analysis of 16S ribosomal RNA metagenomic data, for which databases of reference sequences are known, we survey the computational challenges to be solved in order to be able to characterize and quantify a sample. In particular, we examine two aspects: how the necessary adoption of new tools geared towards high-throughput analysis impacts the quality of the results, and how good is the performance of various established methods to assign sequence reads to microbial species, with and without taking taxonomic information into account.


Assuntos
Metagenômica/métodos , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , DNA Bacteriano/química , Metagenoma , RNA Ribossômico 16S/química
20.
Genes (Basel) ; 14(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37761891

RESUMO

In the diffusion approximation of the neutral Wright-Fisher model, the expected time until fixation or loss of a neutral allele is proportional to the initial entropy of the distribution of the allele in the population. No explanation is known for this coincidence. In this paper, we show that the rate of entropy dissipation is proportional to the number of segregating alleles. Since the final fixed state has zero entropy, the expected lifetime of segregating alleles is proportional to the initial entropy in the system. We show that classical formulae on the average time to loss of segregating alleles and the expected time to fixation of the last segregating allele stem from these properties of the diffusion process. We also extend our results to the case of population size changing in time. The dissipation of heterozygosity and entropy shows that superlinear population growth leads to infinite expected fixation times, i.e., neutral alleles in fast-growing populations could segregate forever without ever becoming fixed or disappearing by genetic drift.


Assuntos
Modelos Genéticos , Polimorfismo Genético , Alelos , Entropia , Deriva Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA