Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros








Intervalo de ano de publicação
1.
Plant Cell Environ ; 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39189930

RESUMO

The availability of high-throughput sequencing technologies increased our understanding of different genomes. However, the genomes of all living organisms still have many unidentified coding sequences. The increased number of missing small open reading frames (sORFs) is due to the length threshold used in most gene identification tools, which is true in the genic and, more importantly and surprisingly, in the intergenic regions. Scanning the cucumber genome intergenic regions revealed 420 723 sORF. We excluded 3850 sORF with similarities to annotated cucumber proteins. To propose the functionality of the remaining 416 873 sORF, we calculated their codon adaptation index (CAI). We found 398 937 novel sORF (nsORF) with CAI ≥ 0.7 that were further used for downstream analysis. Searching against the Rfam database revealed 109 nsORFs similar to multiple RNA families. Using SignalP-5.0 and NLS, identified 11 592 signal peptides. Five predicted proteins interacting with Meloidogyne incognita and Powdery mildew proteins were selected using published transcriptome data of host-pathogen interactions. Gene ontology enrichment interpreted the function of those proteins, illustrating that nsORFs' expression could contribute to the cucumber's response to biotic and abiotic stresses. This research highlights the importance of previously overlooked nsORFs in the cucumber genome and provides novel insights into their potential functions.

2.
Microb Genom ; 9(10)2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37843883

RESUMO

Salmonella enterica is a taxonomically diverse pathogen with over 2600 serovars associated with a wide variety of animal hosts including humans, other mammals, birds and reptiles. Some serovars are host-specific or host-restricted and cause disease in distinct host species, while others, such as serovar S. Typhimurium (STm), are generalists and have the potential to colonize a wide variety of species. However, even within generalist serovars such as STm it is becoming clear that pathovariants exist that differ in tropism and virulence. Identifying the genetic factors underlying host specificity is complex, but the availability of thousands of genome sequences and advances in machine learning have made it possible to build specific host prediction models to aid outbreak control and predict the human pathogenic potential of isolates from animals and other reservoirs. We have advanced this area by building host-association prediction models trained on a wide range of genomic features and compared them with predictions based on nearest-neighbour phylogeny. SNPs, protein variants (PVs), antimicrobial resistance (AMR) profiles and intergenic regions (IGRs) were extracted from 3883 high-quality STm assemblies collected from humans, swine, bovine and poultry in the USA, and used to construct Random Forest (RF) machine learning models. An additional 244 recent STm assemblies from farm animals were used as a test set for further validation. The models based on PVs and IGRs had the best performance in terms of predicting the host of origin of isolates and outperformed nearest-neighbour phylogenetic host prediction as well as models based on SNPs or AMR data. However, the models did not yield reliable predictions when tested with isolates that were phylogenetically distinct from the training set. The IGR and PV models were often able to differentiate human isolates in clusters where the majority of isolates were from a single animal source. Notably, IGRs were the feature with the best performance across multiple models which may be due to IGRs acting as both a representation of their flanking genes, equivalent to PVs, while also capturing genomic regulatory variation, such as altered promoter regions. The IGR and PV models predict that ~45 % of the human infections with STm in the USA originate from bovine, ~40 % from poultry and ~14.5 % from swine, although sequences of isolates from other sources were not used for training. In summary, the research demonstrates a significant gain in accuracy for models with IGRs and PVs as features compared to SNP-based and core genome phylogeny predictions when applied within the existing population structure. This article contains data hosted by Microreact.


Assuntos
Salmonelose Animal , Salmonella typhimurium , Animais , Bovinos , Humanos , Suínos , Salmonelose Animal/epidemiologia , Filogenia , DNA Intergênico , Genoma Bacteriano , Genômica , Aprendizado de Máquina , Mamíferos/genética
3.
J Comput Biol ; 30(8): 861-876, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37222724

RESUMO

The most common way to calculate the rearrangement distance between two genomes is to use the size of a minimum length sequence of rearrangements that transforms one of the two given genomes into the other, where the genomes are represented as permutations using only their gene order, based on the assumption that genomes have the same gene content. With the advance of research in genome rearrangements, new works extended the classical models by either considering genomes with different gene content (unbalanced genomes) or including more genomic characteristics to the mathematical representation of the genomes, such as the distribution of intergenic regions sizes. In this study, we study the Reversal, Transposition, and Indel (Insertion and Deletion) Distance using intergenic information, which allows comparing unbalanced genomes, because indels are included in the rearrangement model (i.e., the set of possible rearrangements allowed when we compute the distance). For the particular case of transpositions and indels on unbalanced genomes, we present a 4-approximation algorithm, improving a previous 4.5 approximation. This algorithm is extended so as to deal with gene orientation and to maintain the 4-approximation factor for the Reversal, Transposition, and Indel Distance on unbalanced genomes. Furthermore, we evaluate the proposed algorithms using experiments on simulated data.


Assuntos
Rearranjo Gênico , Modelos Genéticos , Genoma/genética , Genômica , Mutação INDEL , Algoritmos
4.
Mol Biol Evol ; 40(3)2023 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-36917489

RESUMO

Intergenic genomic regions have essential regulatory and structural roles that impose constraints on their sequences. But regions that do not currently encode proteins also carry the potential to do so in the future. De novo gene emergence, the evolution of novel genes out of previously noncoding sequences has now been established as a potent force for genomic novelty. Recently, it was shown that intergenic regions in the genome of Saccharomyces cerevisiae harbor pervasive cryptic potential to, if theoretically translated, form transmembrane domains (TM domains) more frequently than expected by chance given their nucleotide composition, a property that we refer to as TM-forming enrichment. The source and biological relevance of this property is unknown. Here, we expand the investigation into the TM-forming potential of intergenic regions to the entire Saccharomycotina budding yeast subphylum, in an effort to explain this property and understand its importance. We find pervasive but variable enrichment in TM-forming potential across the subphylum regardless of the composition and average size of intergenic regions. This cryptic property is evenly spread across the genome, cannot be explained by the hydrophobic content of the sequence, and does not appear to localize to regions containing regulatory motifs. This TM-forming enrichment specifically, and not the actual TM-forming potential, is associated, across genomes, with more TM domains in evolutionarily young genes. Our findings shed light on this newly discovered feature of yeast genomes and constitute a first step toward understanding its evolutionary importance.


Assuntos
Saccharomycetales , Leveduras , DNA Intergênico/genética , Leveduras/genética , Saccharomyces cerevisiae/genética , Genômica , Genoma , Saccharomycetales/genética
5.
Biofilm ; 4: 100093, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36408060

RESUMO

Staphylococcus aureus is a leading cause of prosthetic joint infections (PJI). Surface adhesins play an important role in the primary attachment to plasma proteins that coat the surface of prosthetic devices after implantation. Previous efforts to identify a genetic component of the bacterium that confers an enhanced capacity to cause PJI have focused on gene content, kmers, or single-nucleotide polymorphisms (SNPs) in coding sequences. Here, using a collection of S. aureus strains isolated from PJI and wounds, we investigated whether genetic variations in the regulatory region of genes encoding surface adhesins lead to differences in their expression levels and modulate the capacity of S. aureus to colonize implanted prosthetic devices. The data revealed that S. aureus isolates from the same clonal complex (CC) contain a specific pattern of SNPs in the regulatory region of genes encoding surface adhesins. As a consequence, each clonal lineage shows a specific profile of surface proteins expression. Co-infection experiments with representative isolates of the most prevalent CCs demonstrated that some lineages have a higher capacity to colonize implanted catheters in a murine infection model, which correlated with a greater ability to form a biofilm on coated surfaces with plasma proteins. Together, results indicate that differences in the expression level of surface adhesins may modulate the propensity of S. aureus strains to cause PJI. Given the high conservation of surface proteins among staphylococci, our work lays the framework for investigating how diversification at intergenic regulatory regions affects the capacity of S. aureus to colonize the surface of medical implants.

6.
Genomics ; 114(2): 110297, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35134501

RESUMO

We determined the mitogenome of Cyclopterus lumpus using a hybrid sequencing approach, and another four closely related species in the Liparidae based on available next-generation sequence data. We found that the mitogenome of C. lumpus was 17,266 bp in length, where the length and organisation were comparable to those reported for cottoids. However, we found a GC-homopolymer region in the intergenic space between tRNALeu2 and ND1 in liparids and cyclopterids. Phylogenetic reconstruction confirmed the monophyly of infraorders and firmly supported a sister-group relationship between Cyclopteridae and Liparidae. Purifying selection was the predominant force in the evolution of cottoid mitogenomes. There was significant evidence of relaxed selective pressures along the lineage of deep-sea fish, while selection was intensified in the freshwater lineage. Overall, our analysis provides a necessary expansion in the availability of mitogenomic sequences and sheds light on mitogenomic adaptation in Cottoidei fish inhabiting different aquatic environments.


Assuntos
Genoma Mitocondrial , Perciformes , Animais , Peixes/genética , Perciformes/genética , Filogenia , RNA de Transferência
7.
Algorithms Mol Biol ; 16(1): 24, 2021 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-34965857

RESUMO

BACKGROUND: In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. A reversal inverts a segment of a genome while a transposition swaps two consecutive segments. Initial studies in the area considered only the order of the genes. Recent works have incorporated other genetic information in the model. In particular, the information regarding the size of intergenic regions, which are structures between each pair of genes and in the extremities of a linear genome. RESULTS AND CONCLUSIONS: In this work, we investigate the SORTING BY INTERGENIC REVERSALS AND TRANSPOSITIONS problem on genomes sharing the same set of genes, considering the cases where the orientation of genes is known and unknown. Besides, we explored a variant of the problem, which generalizes the transposition event. As a result, we present an approximation algorithm that guarantees an approximation factor of 4 for both cases considering the reversal and transposition (classic definition) events, an improvement from the 4.5-approximation previously known for the scenario where the orientation of the genes is unknown. We also present a 3-approximation algorithm by incorporating the generalized transposition event, and we propose a greedy strategy to improve the performance of the algorithms. We performed practical tests adopting simulated data which indicated that the algorithms, in both cases, tend to perform better when compared with the best-known algorithms for the problem. Lastly, we conducted experiments using real genomes to demonstrate the applicability of the algorithms.

8.
J Bioinform Comput Biol ; 19(6): 2140011, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34775923

RESUMO

Problems in the genome rearrangement field are often formulated in terms of pairwise genome comparison: given two genomes [Formula: see text] and [Formula: see text], find the minimum number of genome rearrangements that may have occurred during the evolutionary process. This broad definition lacks at least two important considerations: the first being which features are extracted from genomes to create a useful mathematical model, and the second being which types of genome rearrangement events should be represented. Regarding the first consideration, seminal works in the genome rearrangement field solely used gene order to represent genomes as permutations of integer numbers, neglecting many important aspects like gene duplication, intergenic regions, and complex interactions between genes. Regarding the second consideration, some rearrangement events are widely studied such as reversals and transpositions. In this paper, we shed light on the first consideration and created a model that takes into account gene order and the number of nucleotides in intergenic regions. In addition, we consider events of reversals, transpositions, and indels (insertions and deletions) of genomic material. We present a 4-approximation algorithm for reversals and indels, a [Formula: see text]-approximation algorithm for transpositions and indels, and a 6-approximation for reversals, transpositions, and indels.


Assuntos
Genoma , Modelos Genéticos , Algoritmos , DNA Intergênico/genética , Rearranjo Gênico , Genômica
9.
Algorithms Mol Biol ; 16(1): 21, 2021 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-34645469

RESUMO

The rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events are the transposition, which exchanges two consecutive blocks of the genome, and the reversal, which reverts a block of the genome. When dealing with such problems, seminal works represented genomes as sequences of genes without repetition. More realistic models started to consider gene repetition or the presence of intergenic regions, sequences of nucleotides between genes and in the extremities of the genome. This work explores the transposition and reversal events applied in a genome representation considering both gene repetition and intergenic regions. We define two problems called Minimum Common Intergenic String Partition and Reverse Minimum Common Intergenic String Partition. Using a relation with these two problems, we show a [Formula: see text]-approximation for the Intergenic Transposition Distance, the Intergenic Reversal Distance, and the Intergenic Reversal and Transposition Distance problems, where k is the maximum number of copies of a gene in the genomes. Our practical experiments on simulated genomes show that the use of partitions improves the estimates for the distances.

10.
PeerJ ; 8: e9740, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32879803

RESUMO

As a small order of Pterygota (Insecta), Ephemeroptera has almost 3,500 species around the world. Ephemerellidae is a widely distributed common group of Ephemeroptera. However, the relationship among Ephemerellidae, Vietnamellidae and Teloganellidae is still in dispute. In this study, we sequenced six complete mitogenomes of three genera from Ephemerellidae (Insecta: Ephemeroptera): Ephemerella sp. Yunnan-2018, Serratella zapekinae, Serratella sp. Yunnan-2018, Serratella sp. Liaoning-2019, Torleya grandipennis and T. tumiforceps. These mitogenomes were employed to reveal controversial phylogenetic relationships among the Ephemeroptera, with emphasis on the phylogenetic relationships among Ephemerellidae. The lengths of the six mayfly mitogenomes ranged from 15,134 bp to 15,703 bp. Four mitogenomes of Ephemerella sp. Yunnan-2018, Serratella zapekinae, Serratella sp. Yunnan-2018 and Serratella sp. Liaoning-2019 had 22 tRNAs including an inversion and translocation of trnI. By contrast, the mitogenomes of T. tumiforceps and T. grandipennis had 24 tRNAs due to an extra two copies of inversion and translocation of trnI. Within the family Ephemerellidae, disparate gene rearrangement occurred in the mitogenomes of different genera: one copy of inversion and translocation trnI in the genera Ephemerella and Serratella, and three repeat copies of inversion and translocation of trnI in the genus Torleya. A large non-coding region (≥200 bp) between trnS1 (AGN) and trnE was detected in T. grandipennis and T. tumiforceps. Among the phylogenetic relationship of the Ephemeroptera, the monophyly of almost all families except Siphlonuridae was supported by BI and ML analyses. The phylogenetic results indicated that Ephemerellidae was the sister clade to Vietnamellidae whereas Teloganellidae was not a sister clade of Ephemerellidae and Vietnamellidae.

11.
Pharmacogenomics ; 21(8): 509-520, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32427048

RESUMO

Aim: GDF15 levels are a biomarker for metformin use. We performed the functional annotation of noncoding genome-wide association study (GWAS) SNPs for GDF15 levels and the Genotype-Tissue Expression (GTEx)-expression quantitative trait loci (eQTLs) for GDF15 expression within metformin-activated enhancers around GDF15. Materials & methods: These enhancers were identified using chromatin immunoprecipitation followed by sequencing data for active (H3K27ac) and silenced (H3K27me3) histone marks on human hepatocytes treated with metformin, Encyclopedia of DNA Elements data and cis-regulatory elements assignment tools. Results: The GWAS lead SNP rs888663, the SNP rs62122429 associated with GDF15 levels in the Outcome Reduction with Initial Glargine Intervention trial, and the GTEx-expression quantitative trait locus rs4808791 for GDF15 expression in whole blood are located in a metformin-activated enhancer upstream of GDF15 and tightly linked in Europeans and East Asians. Conclusion: Noncoding variation within a metformin-activated enhancer may increase GDF15 expression and help to predict GDF15 levels.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Fator 15 de Diferenciação de Crescimento/biossíntese , Fator 15 de Diferenciação de Crescimento/genética , Metformina/farmacologia , Polimorfismo de Nucleotídeo Único/genética , Linhagem Celular , Hepatócitos/efeitos dos fármacos , Hepatócitos/metabolismo , Humanos , Hipoglicemiantes/farmacologia , Polimorfismo de Nucleotídeo Único/efeitos dos fármacos
12.
J Comput Biol ; 27(2): 156-174, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31891533

RESUMO

During the evolutionary process, genomes are affected by various genome rearrangements, that is, events that modify large stretches of the genetic material. In the literature, a large number of models have been proposed to estimate the number of events that occurred during evolution; most of them represent a genome as an ordered sequence of genes, and, in particular, disregard the genetic material between consecutive genes. However, recent studies showed that taking into account the genetic material between consecutive genes can enhance evolutionary distance estimations. Reversal and transposition are genome rearrangements that have been widely studied in the literature. A reversal inverts a (contiguous) segment of the genome, while a transposition swaps the positions of two consecutive segments. Genomes also undergo nonconservative events (events that alter the amount of genetic material) such as insertions and deletions, in which genetic material from intergenic regions of the genome is inserted or deleted, respectively. In this article, we study a genome rearrangement model that considers both gene order and sizes of intergenic regions. We investigate the reversal distance, and also the reversal and transposition distance between two genomes in two scenarios: with and without nonconservative events. We show that these problems are NP-hard and we present constant ratio approximation algorithms for all of them. More precisely, we provide a 4-approximation algorithm for the reversal distance, both in the conservative and nonconservative versions. For the reversal and transposition distance, we provide a 4.5-approximation algorithm, both in the conservative and nonconservative versions. We also perform experimental tests to verify the behavior of our algorithms, as well as to compare the practical and theoretical results. We finally extend our study to scenarios in which events have different costs, and we present constant ratio approximation algorithms for each scenario.

13.
Appl Microbiol Biotechnol ; 104(2): 833-852, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31848654

RESUMO

Bacillus pumilus, an endospore-forming soil bacterium, produces a wide array of extracellular proteins, such as proteases, which are already applied in the chemical, detergent and leather industries. Small noncoding regulatory RNAs (sRNAs) in bacteria are important RNA regulators that act in response to various environmental signals. Here, an RNA-seq-based transcriptome analysis was applied to B. pumilus SCU11, a strain that produces extracellular alkaline protease, across various growth phases of the protease fermentation process. Through bioinformatics screening of the sequencing data and visual inspection, 84 putative regulatory sRNAs were identified in B. pumilus, including 21 antisense sRNAs and 63 sRNAs in intergenic regions. We experimentally validated the expression of 48 intergenic sRNAs by quantitative RT-PCR (qRT-PCR). Meanwhile, the expression of 6 novel sRNAs was confirmed by northern blotting, and the expression profiles of 5 sRNAs showed close correlation with the growth phase. We revealed that the sRNA Bpsr137 was involved in flagellum and biofilm formation in B. pumilus. The identification of a global set of sRNAs increases the inventory of regulatory sRNAs in Bacillus and implies the important regulatory roles of sRNA in B. pumilus. These findings will contribute another dimension to the optimization of crucial metabolic activities of B. pumilus during a productive fermentation process.


Assuntos
Bacillus pumilus/crescimento & desenvolvimento , Bacillus pumilus/genética , Peptídeo Hidrolases/metabolismo , Pequeno RNA não Traduzido/biossíntese , Bacillus pumilus/metabolismo , Northern Blotting , Biologia Computacional , Fermentação , Perfilação da Expressão Gênica , Regulação Bacteriana da Expressão Gênica , Pequeno RNA não Traduzido/genética , Reação em Cadeia da Polimerase em Tempo Real , Análise de Sequência de RNA
14.
Algorithms Mol Biol ; 14: 21, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31709002

RESUMO

BACKGROUND: The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance, allowing different genome rearrangements operations at the same time, adding constraints to these rearrangements (e.g., each rearrangement can affect at most a given number of genes), considering that a rearrangement implies a cost depending on its length rather than a unit cost, etc. Most of the works, however, have overlooked some important features inside genomes, such as the presence of sequences of nucleotides between genes, called intergenic regions. RESULTS AND CONCLUSIONS: In this work, we investigate the problem of computing the distance between two genomes, taking into account both gene order and intergenic sizes. The genome rearrangement operations we consider here are constrained types of reversals and transpositions, called super short reversals (SSRs) and super short transpositions (SSTs), which affect up to two (consecutive) genes. We denote by super short operations (SSOs) any SSR or SST. We show 3-approximation algorithms when the orientation of the genes is not considered when we allow SSRs, SSTs, or SSOs, and 5-approximation algorithms when considering the orientation for either SSRs or SSOs. We also show that these algorithms improve their approximation factors when the input permutation has a higher number of inversions, where the approximation factor decreases from 3 to either 2 or 1.5, and from 5 to either 3 or 2.

15.
FEMS Yeast Res ; 19(8)2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31665278

RESUMO

Cryptococcus spp. are fungal species belonging to Tremellomycetes, Agaricomycotina, Basidiomycota, and several members are responsible for cryptococcosis, one of the most ubiquitous human mycoses. Affecting mainly immunosuppressed patients, but also immunocompetent ones, the members of this genus present a high level of genetic diversity. In this study, two mitochondrial intergenic regions, i.e. nad1-cob and cob-rps3, were tested for the intra- or interspecies discrimination and identification of strains and species of the genus Cryptococcus. Phylogenetic trees were constructed based on individual and concatenated sequences from representative pathogenic strains of the Cryptococcus neoformans/Cryptococcus gattii complex, representing serotypes and AFLP genotypes of all newly introduced species of this complex. Using both intergenic regions, as well as the concatenated dataset, the strains clustered in accordance with the new taxonomy. These results suggest that identification of Cryptococcus strains is possible by employing these mitochondrial intergenic regions using PCR amplification as a quick and effective method to elucidate genotypic and taxonomic differences. Thus, these regions may be applicable to a broad range of clinical studies, leading to a rapid recognition of the clinical profiles of patients.


Assuntos
Cryptococcus/genética , Cryptococcus/patogenicidade , DNA Fúngico/genética , DNA Intergênico , Genes Mitocondriais , Criptococose/microbiologia , DNA Ribossômico/genética , Humanos , Glicoproteínas de Membrana/genética , Mitocôndrias/genética , Técnicas de Tipagem Micológica , NADH Desidrogenase/genética , Filogenia , Proteínas Ribossômicas/genética
16.
PeerJ ; 6: e4595, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29686943

RESUMO

The family Toxoderidae (Mantodea) contains an ecologically diverse group of praying mantis species that have in common greatly elongated bodies. In this study, we sequenced and compared the complete mitochondrial genomes of two Toxoderidae species, Paratoxodera polyacantha and Toxodera hauseri, and compared their mitochondrial genome characteristics with another member of the Toxoderidae, Stenotoxodera porioni (KY689118). The lengths of the mitogenomes of T. hauseri and P. polyacantha were 15,616 bp and 15,999 bp, respectively, which is similar to that of S. porioni (15,846 bp). The size of each gene as well as the A+T-rich region and the A+T content of the whole genome were also very similar among the three species as were the protein-coding genes, the A+T content and the codon usages. The mitogenome of T. hauseri had the typical 22 tRNAs, whereas that of P. polyacantha had 26 tRNAs including an extra two copies of trnA-trnR. Intergenic regions of 67 bp and 76 bp were found in T. hauseri and P. polyacantha, respectively, between COX2 and trnK; these can be explained as residues of a tandem duplication/random loss of trnK and trnD. This non-coding region may be synapomorphic for Toxoderidae. In BI and ML analyses, the monophyly of Toxoderidae was supported and P. polyacantha was the sister clade to T. hauseri and S. porioni.

17.
G3 (Bethesda) ; 7(8): 2791-2797, 2017 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-28667017

RESUMO

Gene expression patterns assayed across development can offer key clues about a gene's function and regulatory role. Drosophila melanogaster is ideal for such investigations as multiple individual and high-throughput efforts have captured the spatiotemporal patterns of thousands of embryonic expressed genes in the form of in situ images. FlyExpress (www.flyexpress.net), a knowledgebase based on a massive and unique digital library of standardized images and a simple search engine to find coexpressed genes, was created to facilitate the analytical and visual mining of these patterns. Here, we introduce the next generation of FlyExpress resources to facilitate the integrative analysis of sequence data and spatiotemporal patterns of expression from images. FlyExpress 7 now includes over 100,000 standardized in situ images and implements a more efficient, user-defined search algorithm to identify coexpressed genes via Genomewide Expression Maps (GEMs). Shared motifs found in the upstream 5' regions of any pair of coexpressed genes can be visualized in an interactive dotplot. Additional webtools and link-outs to assist in the downstream validation of candidate motifs are also provided. Together, FlyExpress 7 represents our largest effort yet to accelerate discovery via the development and dispersal of new webtools that allow researchers to perform data-driven analyses of coexpression (image) and genomic (sequence) data.


Assuntos
Drosophila melanogaster/genética , Regulação da Expressão Gênica , Imageamento Tridimensional , Hibridização In Situ , Software , Animais , Sítios de Ligação/genética , Sequência Conservada/genética , Genoma de Inseto , Fatores de Transcrição/metabolismo
18.
Algorithms Mol Biol ; 12: 16, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28592988

RESUMO

BACKGROUND: Combinatorial works on genome rearrangements have so far ignored the influence of intergene sizes, i.e. the number of nucleotides between consecutive genes, although it was recently shown decisive for the accuracy of inference methods (Biller et al. in Genome Biol Evol 8:1427-39, 2016; Biller et al. in Beckmann A, Bienvenu L, Jonoska N, editors. Proceedings of Pursuit of the Universal-12th conference on computability in Europe, CiE 2016, Lecture notes in computer science, vol 9709, Paris, France, June 27-July 1, 2016. Berlin: Springer, p. 35-44, 2016). In this line, we define a new genome rearrangement model called wDCJ, a generalization of the well-known double cut and join (or DCJ) operation that modifies both the gene order and the intergene size distribution of a genome. RESULTS: We first provide a generic formula for the wDCJ distance between two genomes, and show that computing this distance is strongly NP-complete. We then propose an approximation algorithm of ratio 4/3, and two exact ones: a fixed-parameter tractable (FPT) algorithm and an integer linear programming (ILP) formulation. CONCLUSIONS: We provide theoretical and empirical bounds on the expected growth of the parameter at the center of our FPT and ILP algorithms, assuming a probabilistic model of evolution under wDCJ, which shows that both these algorithms should run reasonably fast in practice.

19.
Genetics ; 206(1): 363-376, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28280056

RESUMO

Nontranslated intergenic regions (IGRs) compose 10-15% of bacterial genomes, and contain many regulatory elements with key functions. Despite this, there are few systematic studies on the strength and direction of selection operating on IGRs in bacteria using whole-genome sequence data sets. Here we exploit representative whole-genome data sets from six diverse bacterial species: Staphylococcus aureus, Streptococcus pneumoniae, Mycobacterium tuberculosis, Salmonella enterica, Klebsiella pneumoniae, and Escherichia coli We compare patterns of selection operating on IGRs using two independent methods: the proportion of singleton mutations and the dI/dS ratio, where dI is the number of intergenic SNPs per intergenic site. We find that the strength of purifying selection operating over all intergenic sites is consistently intermediate between that operating on synonymous and nonsynonymous sites. Ribosome binding sites and noncoding RNAs tend to be under stronger selective constraint than promoters and Rho-independent terminators. Strikingly, a clear signal of purifying selection remains even when all these major categories of regulatory elements are excluded, and this constraint is highest immediately upstream of genes. While a paucity of variation means that the data for M. tuberculosis are more equivocal than for the other species, we find strong evidence for positive selection within promoters of this species. This points to a key adaptive role for regulatory changes in this important pathogen. Our study underlines the feasibility and utility of gauging the selective forces operating on bacterial IGRs from whole-genome sequence data, and suggests that our current understanding of the functionality of these sequences is far from complete.


Assuntos
DNA Intergênico/genética , Genoma Bacteriano , RNA não Traduzido/genética , Sequências Reguladoras de Ácido Nucleico , Sequência Conservada/genética , Escherichia coli/genética , Evolução Molecular , Klebsiella pneumoniae/genética , Mycobacterium tuberculosis/genética , Ribossomos/genética , Salmonella enterica/genética , Staphylococcus aureus/genética , Streptococcus pneumoniae/genética
20.
BMC Bioinformatics ; 17(Suppl 14): 426, 2016 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-28185582

RESUMO

BACKGROUND: Given two genomes that have diverged by a series of rearrangements, we infer minimum Double Cut-and-Join (DCJ) scenarios to explain their organization differences, coupled with indel scenarios to explain their intergene size distribution, where DCJs themselves also alter the sizes of broken intergenes. RESULTS: We give a polynomial-time algorithm that, given two genomes with arbitrary intergene size distributions, outputs a DCJ scenario which optimizes on the number of DCJs, and given this optimal number of DCJs, optimizes on the total sum of the sizes of the indels. CONCLUSIONS: We show that there is a valuable information in the intergene sizes concerning the rearrangement scenario itself. On simulated data we show that statistical properties of the inferred scenarios are closer to the true ones than DCJ only scenarios, i.e. scenarios which do not handle intergene sizes.


Assuntos
Rearranjo Gênico/genética , Genoma , Modelos Genéticos , Algoritmos , Mutação INDEL
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA