RESUMO
Most Ralstonia solanacearum species complex strains cause bacterial wilts in tropical or subtropical zones, but the group known as Race 3 biovar 2 (R3bv2) is cool virulent and causes potato brown rot at lower temperatures. R3bv2 has invaded potato-growing regions around the world but is not established in the United States. Phylogenetically, R3bv2 corresponds to a subset of the R. solanacearum phylotype IIB clade, but little is known about the distribution of the cool virulence phenotype within phylotype IIB. Therefore, genomes of 76 potentially cool virulent phylotype IIB strains and 30 public genomes were phylogenetically analyzed. A single clonal lineage within the sequevar 1 subclade of phylotype IIB that originated in South America has caused nearly all brown rot outbreaks worldwide. To correlate genotypes with relevant phenotypes, we quantified virulence of ten Ralstonia strains on tomato and potato at both 22°C and 28°C. Cool virulence on tomato did not predict cool virulence on potato. We found that cool virulence is a quantitative trait. Strains in the sequevar 1 pandemic clonal lineage caused the most disease, while other R3bv2 strains were only moderately cool virulent. However, some non-R3bv2 strains were highly cool virulent and aggressively colonized potato tubers. Thus, cool virulence is not consistently correlated with strains historically classified as R3bv2 group. To aid detection of sequevar 1 strains, this group was genomically delimited in the LINbase web server and a sequevar 1 diagnostic primer pair was developed and validated. We discuss implications of these results for the R3bv2 definition.
RESUMO
High throughput DNA sequencing in combination with efficient algorithms could provide the basis for a highly resolved, genome phylogeny-based and digital prokaryotic taxonomy. However, current taxonomic practice continues to rely on cumbersome journal publications for the description of new species, which still constitute the smallest taxonomic units. In response, we introduce LINbase, a web server that allows users to genomically circumscribe any group of prokaryotes with measurable DNA similarity and that uses the individual isolate as smallest unit. Since LINbase leverages the concept of Life Identification Numbers (LINs), which are codes assigned to individual genomes based on reciprocal average nucleotide identity, we refer to groups circumscribed in LINbase as LINgroups. Users can associate with each LINgroup a name, a short description, and a URL to a peer-reviewed publication. As soon as a LINgroup is circumscribed, any user can immediately identify query genomes as members and submit comments about the LINgroup. Most genomes currently in LINbase were imported from GenBank, but users can upload their own genome sequences as well. In conclusion, LINbase combines the resolution of LINs with the power of crowdsourcing in support of a highly resolved, genome phylogeny-based digital taxonomy. LINbase is available at http://www.LINbase.org.
Assuntos
Bactérias/classificação , Genoma Bacteriano , Software , Algoritmos , Bactérias/genética , Bactérias/isolamento & purificação , Genoma Arqueal , Genômica/métodos , Internet , FilogeniaRESUMO
BACKGROUND: Metagenomics is gaining attention as a powerful tool for identifying how agricultural management practices influence human and animal health, especially in terms of potential to contribute to the spread of antibiotic resistance. However, the ability to compare the distribution and prevalence of antibiotic resistance genes (ARGs) across multiple studies and environments is currently impossible without a complete re-analysis of published datasets. This challenge must be addressed for metagenomics to realize its potential for helping guide effective policy and practice measures relevant to agricultural ecosystems, for example, identifying critical control points for mitigating the spread of antibiotic resistance. RESULTS: Here we introduce AgroSeek, a centralized web-based system that provides computational tools for analysis and comparison of metagenomic data sets tailored specifically to researchers and other users in the agricultural sector interested in tracking and mitigating the spread of ARGs. AgroSeek draws from rich, user-provided metagenomic data and metadata to facilitate analysis, comparison, and prediction in a user-friendly fashion. Further, AgroSeek draws from publicly-contributed data sets to provide a point of comparison and context for data analysis. To incorporate metadata into our analysis and comparison procedures, we provide flexible metadata templates, including user-customized metadata attributes to facilitate data sharing, while maintaining the metadata in a comparable fashion for the broader user community and to support large-scale comparative and predictive analysis. CONCLUSION: AgroSeek provides an easy-to-use tool for environmental metagenomic analysis and comparison, based on both gene annotations and associated metadata, with this initial demonstration focusing on control of antibiotic resistance in agricultural ecosystems. Agroseek creates a space for metagenomic data sharing and collaboration to assist policy makers, stakeholders, and the public in decision-making. AgroSeek is publicly-available at https://agroseek.cs.vt.edu/ .
Assuntos
Resistência Microbiana a Medicamentos/genética , Microbiologia Ambiental , Genes Bacterianos , Metadados , Metagenômica , Ecossistema , Internet , Metagenoma , SoftwareRESUMO
Routine strain-level identification of plant pathogens directly from symptomatic tissue could significantly improve plant disease control and prevention. Here we tested the Oxford Nanopore Technologies (ONT) MinION sequencer for metagenomic sequencing of tomato plants either artificially inoculated with a known strain of the bacterial speck pathogen Pseudomonas syringae pv. tomato or collected in the field and showing bacterial spot symptoms caused by one of four Xanthomonas species. After species-level identification via ONT's WIMP software and the third-party tools Sourmash and MetaMaps, we used Sourmash and MetaMaps with a custom database of representative genomes of bacterial tomato pathogens to attempt strain-level identification. In parallel, each metagenome was assembled and the longest contigs were used as query with the genome-based microbial identification Web service LINbase. Both the read-based and assembly-based approaches correctly identified P. syringae pv. tomato strain T1 in the artificially inoculated samples. The pathogen strain in most field samples was identified as a member of Xanthomonas perforans group 2. This result was confirmed by whole genome sequencing of colonies isolated from one of the samples. Although in our case metagenome-based pathogen identification at the strain level was achieved, caution still must be exercised in interpreting strain-level results because of the challenges inherent to assigning reads to specific strains and the error rate of nanopore sequencing.
Assuntos
Solanum lycopersicum , Xanthomonas , Bactérias , Metagenoma , Doenças das PlantasRESUMO
Developing Arabidopsis seeds accumulate oils and seed storage proteins synthesized by the pathways of primary metabolism. Seed development and metabolism are positively regulated by transcription factors belonging to the LAFL (LEC1, AB13, FUSCA3 and LEC2) regulatory network. The VAL gene family encodes repressors of the seed maturation program in germinating seeds, although they are also expressed during seed maturation. The possible regulatory role of VAL1 in seed development has not been studied to date. Reverse genetics revealed that val1 mutant seeds accumulated elevated levels of proteins compared with the wild type, suggesting that VAL1 functions as a repressor of seed metabolism; however, in the absence of VAL1, the levels of metabolites, ABA, auxin and jasmonate derivatives did not change significantly in developing embryos. Two VAL1 splice variants were identified through RNA sequencing analysis: a full-length form and a truncated form lacking the plant homeodomain-like domain associated with epigenetic repression. None of the transcripts encoding the core LAFL network transcription factors were affected in val1 embryos. Instead, activation of VAL1 by FUSCA3 appears to result in the repression of a subset of seed maturation genes downstream of core LAFL regulators, as 39% of transcripts in the FUSCA3 regulon were derepressed in the val1 mutant. The LEC1 and LEC2 regulons also responded, but to a lesser extent. Additional 832 transcripts that were not LAFL targets were derepressed in val1 mutant embryos. These transcripts are candidate targets of VAL1, acting through epigenetic and/or transcriptional repression.
Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/embriologia , Arabidopsis/metabolismo , Regulação da Expressão Gênica de Plantas , Proteínas Repressoras/metabolismo , Fatores de Transcrição/metabolismo , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Proteínas Estimuladoras de Ligação a CCAAT/genética , Proteínas Estimuladoras de Ligação a CCAAT/metabolismo , Regulação da Expressão Gênica no Desenvolvimento/genética , Regulação da Expressão Gênica de Plantas/genética , Proteínas Repressoras/genética , Fatores de Transcrição/genéticaRESUMO
MOTIVATION: Can we predict protein-protein interactions (PPIs) of a novel virus with its host? Three major problems arise: the lack of known PPIs for that virus to learn from, the cost of learning about its proteins and the sequence dissimilarity among viral families that makes most methods inapplicable or inefficient. We develop DeNovo, a sequence-based negative sampling and machine learning framework that learns from PPIs of different viruses to predict for a novel one, exploiting the shared host proteins. We tested DeNovo on PPIs from different domains to assess generalization. RESULTS: By solving the challenge of generating less noisy negative interactions, DeNovo achieved accuracy up to 81 and 86% when predicting PPIs of viral proteins that have no and distant sequence similarity to the ones used for training, receptively. This result is comparable to the best achieved in single virus-host and intra-species PPI prediction cases. Thus, we can now predict PPIs for virtually any virus infecting human. DeNovo generalizes well; it achieved near optimal accuracy when tested on bacteria-human interactions. AVAILABILITY AND IMPLEMENTATION: Code, data and additional supplementary materials needed to reproduce this study are available at: https://bioinformatics.cs.vt.edu/~alzahraa/denovo CONTACT: alzahraa@vt.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Mapeamento de Interação de Proteínas , Proteínas Virais , Vírus , Previsões , Humanos , Análise de Sequência de DNARESUMO
Estimates of the number of bacterial species range from 107 to 1012. At the pace at which descriptions of new species are currently being published, the description of all bacterial species on earth will only be completed in thousands of years. However, even if one day all species were named and described, these names and descriptions would still be of little practical value unless they could be easily searched and accessed, so that novel strains could be easily identified as members of any of these species. To complicate the situation further, many of the currently known species contain significant genotypic and phenotypic diversity that would still be missed if description of microbial diversity were limited to species. The solution to this problem could be a database in which every bacterial species and every intra-specific group is anchored to a genome-similarity framework. This ideal database should be searchable using complete or partial genome sequences as well as phenotypes. Moreover, the database should include functions to easily add newly sequenced novel strains, automatically place them into the genome-similarity framework, identify them as members of an already named species, or tag them as members of yet to be described species or new intra-specific groups. Here, we propose the means to develop such a database by taking advantage of the concept of genome sequence similarity-based codes, called Life Identification Numbers or LINs.
Assuntos
Bactérias/classificação , Biodiversidade , Conjuntos de Dados como Assunto , Microbiologia/tendências , Bactérias/genética , Genoma Bacteriano/genética , Microbiologia/normas , Filogenia , Terminologia como AssuntoRESUMO
Taxonomy of plant pathogenic bacteria is challenging because pathogens of different crops often belong to the same named species but current taxonomy does not provide names for bacteria below the subspecies level. The introduction of the host range-based pathovar system in the 1980s provided a temporary solution to this problem but has many limitations. The affordability of genome sequencing now provides the opportunity for developing a new genome-based taxonomic framework. We already proposed to name individual bacterial isolates based on pairwise genome similarity. Here, we expand on this idea and propose to use genome similarity-based codes, which we now call life identification numbers (LINs), to describe and name bacterial taxa. Using 93 genomes of Pseudomonas syringae sensu lato, LINs were compared with a P. syringae genome tree whereby the assigned LINs were found to be informative of a majority of phylogenetic relationships. LINs also reflected host range and outbreak association for strains of P. syringae pathovar actinidiae, a pathovar for which many genome sequences are available. We conclude that LINs could provide the basis for a new taxonomic framework to address the shortcomings of the current pathovar system and to complement the current taxonomic system of bacteria in general.
Assuntos
Genoma Bacteriano/genética , Especificidade de Hospedeiro , Doenças das Plantas/microbiologia , Plantas/microbiologia , Pseudomonas syringae/classificação , Filogenia , Pseudomonas syringae/genética , Pseudomonas syringae/fisiologia , Análise de Sequência de DNARESUMO
BACKGROUND: Alternative splicing (AS) is a post-transcriptional regulatory mechanism for gene expression regulation. Splicing decisions are affected by the combinatorial behavior of different splicing factors that bind to multiple binding sites in exons and introns. These binding sites are called splicing regulatory elements (SREs). Here we develop CoSREM (Combinatorial SRE Miner), a graph mining algorithm to discover combinatorial SREs in human exons. Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy. CoSREM is able to identify sets of SREs and is not limited to SRE pairs as are current approaches. RESULTS: We identified 37 SRE sets that include both enhancer and silencer elements. We show that our results intersect with previous results, including some that are experimental. We also show that the SRE set GGGAGG and GAGGAC identified by CoSREM may play a role in exon skipping events in several tumor samples. We applied CoSREM to RNA-Seq data for multiple tissues to identify combinatorial SREs which may be responsible for exon inclusion or exclusion across tissues. CONCLUSION: The new algorithm can identify different combinations of splicing enhancers and silencers without assuming a predefined size or limiting the algorithm to find only pairs of SREs. Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.
Assuntos
Algoritmos , Gráficos por Computador , Regulação Neoplásica da Expressão Gênica , Proteínas de Neoplasias/genética , Neoplasias/genética , Splicing de RNA/genética , Sequências Reguladoras de Ácido Nucleico/genética , Éxons/genética , Humanos , Íntrons/genéticaRESUMO
BACKGROUND: Transcriptomics reveals the existence of transcripts of different coding potential and strand orientation. Alternative splicing (AS) can yield proteins with altered number and types of functional domains, suggesting the global occurrence of transcriptional and post-transcriptional events. Many biological processes, including seed maturation and desiccation, are regulated post-transcriptionally (e.g., by AS), leading to the production of more than one coding or noncoding sense transcript from a single locus. RESULTS: We present an integrated computational framework to predict isoform-specific functions of plant transcripts. This framework includes a novel plant-specific weighted support vector machine classifier called CodeWise, which predicts the coding potential of transcripts with over 96 % accuracy, and several other tools enabling global sequence similarity, functional domain, and co-expression network analyses. First, this framework was applied to all detected transcripts (103,106), out of which 13 % was predicted by CodeWise to be noncoding RNAs in developing soybean embryos. Second, to investigate the role of AS during soybean embryo development, a population of 2,938 alternatively spliced and differentially expressed splice variants was analyzed and mined with respect to timing of expression. Conserved domain analyses revealed that AS resulted in global changes in the number, types, and extent of truncation of functional domains in protein variants. Isoform-specific co-expression network analysis using ArrayMining and clustering analyses revealed specific sub-networks and potential interactions among the components of selected signaling pathways related to seed maturation and the acquisition of desiccation tolerance. These signaling pathways involved abscisic acid- and FUSCA3-related transcripts, several of which were classified as noncoding and/or antisense transcripts and were co-expressed with corresponding coding transcripts. Noncoding and antisense transcripts likely play important regulatory roles in seed maturation- and desiccation-related signaling in soybean. CONCLUSIONS: This work demonstrates how our integrated framework can be implemented to make experimentally testable predictions regarding the coding potential, co-expression, co-regulation, and function of transcripts and proteins related to a biological process of interest.
Assuntos
Processamento Alternativo , Regulação da Expressão Gênica de Plantas , Glycine max/genética , Transcriptoma , Genes de Plantas , RNA de Plantas , Sementes/genética , Glycine max/embriologiaRESUMO
Horizontal gene transfer (HGT) is a powerful evolutionary force that considerably shapes the structure of prokaryotic genomes and is associated with genomic islands (GIs). A GI is a DNA segment composed of transferred genes that can be found within a prokaryotic genome, obtained through HGT. Much research has focused on detecting GIs in genomes, but here we pursue a new course, which is identifying possible preferred locations of GIs in the prokaryotic genome. Here, we identify the locations of the GIs within prokaryotic genomes to examine patterns in those locations. Prokaryotic GIs were analyzed according to the genome structure that they are located in, whether it be a circular or a linear genome. The analytical investigations employed are: (1) studying the GI locations in relation to the origin of replication (oriC); (2) exploring the distances between GIs; and (3) determining the distribution of GIs across the genomes. For each of the investigations, the analysis was performed on all of the GIs in the data set. Moreover, to void bias caused by the distribution of the genomes represented, the GIs in one genome from each species and the GIs of the most frequent species are also analyzed. Overall, the results showed that there are preferred sites for the GIs in the genome. In the linear genomes, these sites are usually located in the oriC region and terminus region, while in the circular genomes, they are located solely in the terminus region. These results also showed that the distance distribution between the GIs is almost exponential, which proves that GIs have preferred sites within genomes. The oriC and termniuns are preferred sites for the GIs and a possible natural explanation for this could be connected to the content of the oriC region. Moreover, the content of the GIs in terms of its protein families was studied and the results demonstrated that the majority of frequent protein families are close to identical in each section.
Assuntos
Transferência Genética Horizontal , Ilhas Genômicas , Genoma Bacteriano , Genoma Arqueal , Origem de Replicação/genética , Células Procarióticas/metabolismoRESUMO
With the aim of helping to set safe exposure limits for the general population, various techniques have been implemented to conduct risk assessments for chemicals and other environmental stressors; however, none of these tools facilitate the identification of completely new chemicals that are likely hazardous and elicit an adverse biological effect. Here, we detail a novel in silico, deep-learning framework that is designed to systematically generate structures for new chemical compounds that are predicted to be chemical hazards. To assess the utility of the framework, we applied the tool to four endpoints related to environmental toxicants and their impacts on human and animal health: (i) toxicity to honeybees, (ii) immunotoxicity, (iii) endocrine disruption via ER-α antagonism, and (iv) mutagenicity. In addition, we characterized the predicted potency of these compounds and examined their structural relationship to existing chemicals of concern. As part of the array of emerging new approach methodologies (NAMs), we anticipate that such a framework will be a significant asset to risk assessors and other environmental scientists when planning and forecasting. Though not in the scope of the present study, we expect that the methodology detailed here could also be useful in the de novo design of more environmentally-friendly industrial chemicals.
Assuntos
Aprendizado Profundo , Humanos , Animais , Estudos Prospectivos , Substâncias Perigosas/toxicidade , Receptores de Estrogênio , Mutagênicos , Medição de Risco/métodosRESUMO
As a central organizing principle of biology, bacteria and archaea are classified into a hierarchical structure across taxonomic ranks from kingdom to subspecies. Traditionally, this organization was based on observable characteristics of form and chemistry but recently, bacterial taxonomy has been robustly quantified using comparisons of sequenced genomes, as exemplified in the Genome Taxonomy Database (GTDB). Such genome-based taxonomies resolve genomes down to genera and species and are useful in many contexts yet lack the flexibility and resolution of a fine-grained approach. The Life Identification Number (LIN) approach is a common, quantitative framework to tie existing (and future) bacterial taxonomies together, increase the resolution of genome-based discrimination of taxa, and extend taxonomic identification below the species level in a principled way. Utilizing LINgroup as an organizational concept helps resolve some of the confusion and unforeseen negative effects resulting from nomenclature changes of microorganisms that are closely related by overall genomic similarity (often due to genome-based reclassification). Our experimental results demonstrate the value of LINs and LINgroups in mapping between taxonomies, translating between different nomenclatures, and integrating them into a single taxonomic framework. They also reveal the robustness of LIN assignment to hyper-parameter changes when considering within-species taxonomic groups.
RESUMO
BACKGROUND: Cold acclimation in woody perennials is a metabolically intensive process, but coincides with environmental conditions that are not conducive to the generation of energy through photosynthesis. While the negative effects of low temperatures on the photosynthetic apparatus during winter have been well studied, less is known about how this is reflected at the level of gene and metabolite expression, nor how the plant generates primary metabolites needed for adaptive processes during autumn. RESULTS: The MapMan tool revealed enrichment of the expression of genes related to mitochondrial function, antioxidant and associated regulatory activity, while changes in metabolite levels over the time course were consistent with the gene expression patterns observed. Genes related to thylakoid function were down-regulated as expected, with the exception of plastid targeted specific antioxidant gene products such as thylakoid-bound ascorbate peroxidase, components of the reactive oxygen species scavenging cycle, and the plastid terminal oxidase. In contrast, the conventional and alternative mitochondrial electron transport chains, the tricarboxylic acid cycle, and redox-associated proteins providing reactive oxygen species scavenging generated by electron transport chains functioning at low temperatures were all active. CONCLUSIONS: A regulatory mechanism linking thylakoid-bound ascorbate peroxidase action with "chloroplast dormancy" is proposed. Most importantly, the energy and substrates required for the substantial metabolic remodeling that is a hallmark of freezing acclimation could be provided by heterotrophic metabolism.
Assuntos
Antioxidantes/metabolismo , Picea/fisiologia , Proteínas de Plantas/metabolismo , Aclimatação , Ascorbato Peroxidases/genética , Ascorbato Peroxidases/metabolismo , Cloroplastos/genética , Cloroplastos/metabolismo , Temperatura Baixa , Ecossistema , Regulação da Expressão Gênica de Plantas , Mitocôndrias/genética , Mitocôndrias/metabolismo , Picea/enzimologia , Picea/genética , Proteínas de Plantas/genética , Estações do AnoRESUMO
Prokaryotic genomes evolve via horizontal gene transfer (HGT), mutations, and rearrangements. A noteworthy part of the HGT process is facilitated by genomic islands (GIs). While previous computational biology research has focused on developing tools to detect GIs in prokaryotic genomes, there has been little research investigating GI patterns and biological connections across species. We have pursued the novel idea of connecting GIs across prokaryotic and phage genomes via patterns of protein families. Such patterns are sequences of protein families frequently present in the genomes of multiple species. We combined the large data set from the IslandViewer4 database with protein families from Pfam while implementing a comprehensive strategy to identify patterns making use of HMMER, BLAST, and MUSCLE. we also implemented Python programs that link the analysis into a single pipeline. Research results demonstrated that related GIs often exist in species that are evolutionarily unrelated and in multiple bacterial phyla. Analysis of the discovered patterns led to the identification of biological connections among prokaryotes and phages. These connections suggest broad HGT connections across the bacterial kingdom and its associated phages. The discovered patterns and connections could provide the basis for additional analysis on HGT breadth and the patterns in pathogenic GIs.
Assuntos
Bacteriófagos , Ilhas Genômicas , Ilhas Genômicas/genética , Bacteriófagos/genética , Células Procarióticas , Proteínas/genética , Bactérias/genética , Biologia Computacional/métodos , Transferência Genética Horizontal , Genoma BacterianoRESUMO
We present a method for detecting horizontal gene transfer (HGT) using partial orders (posets). The method requires a poset for each species/gene pair, where we have a set of species S, and a set of genes G. Given the posets, the method constructs a phylogenetic tree that is compatible with the set of posets; this is done for each gene. Also, the set of posets can be derived from the tree. The trees constructed for each gene are then compared and tested for contradicting information, where a contradiction suggests HGT.
Assuntos
Evolução Molecular , Transferência Genética Horizontal , FilogeniaRESUMO
Mass testing is essential for identifying infected individuals during an epidemic and allowing healthy individuals to return to normal social activities. However, testing capacity is often insufficient to meet global health needs, especially during newly emerging epidemics. Dorfman's method, a classic group testing technique, helps reduce the number of tests required by pooling the samples of multiple individuals into a single sample for analysis. Dorfman's method does not consider the time dynamics or limits on testing capacity involved in infection detection, and it assumes that individuals are infected independently, ignoring community correlations. To address these limitations, we present an adaptive group testing (AGT) strategy based on graph partitioning, which divides a physical contact network into subgraphs (groups of individuals) and assigns testing priorities based on the social contact characteristics of each subgraph. Our AGT aims to maximize the number of infected individuals detected and minimize the number of tests required. After each testing round (perhaps on a daily basis), the testing priority is increased for each neighboring group of known infected individuals. We also present an enhanced infectious disease transmission model that simulates the dynamic spread of a pathogen and evaluate our AGT strategy using the simulation results. When applied to 13 social contact networks, AGT demonstrates significant performance improvements compared to Dorfman's method and its variations. Our AGT strategy requires fewer tests overall, reduces disease spread, and retains robustness under changes in group size, testing capacity, and other parameters. Testing plays a crucial role in containing and mitigating pandemics by identifying infected individuals and helping to prevent further transmission in families and communities. By identifying infected individuals and helping to prevent further transmission in families and communities, our AGT strategy can have significant implications for public health, providing guidance for policymakers trying to balance economic activity with the need to manage the spread of infection.
Assuntos
Doenças Transmissíveis , Interação Social , Humanos , Simulação por ComputadorRESUMO
De novo genes are genes that emerge as new genes in some species, such as primate de novo genes that emerge in certain primate species. Over the past decade, a great deal of research has been conducted regarding their emergence, origins, functions, and various attributes in different species, some of which have involved estimating the ages of de novo genes. However, limited by the number of species available for whole-genome sequencing, relatively few studies have focused specifically on the emergence time of primate de novo genes. Among those, even fewer investigate the association between primate gene emergence with environmental factors, such as paleoclimate (ancient climate) conditions. This study investigates the relationship between paleoclimate and human gene emergence at primate species divergence. Based on 32 available primate genome sequences, this study has revealed possible associations between temperature changes and the emergence of de novo primate genes. Overall, findings in this study are that de novo genes tended to emerge in the recent 13 MY when the temperature continues cooling, which is consistent with past findings. Furthermore, in the context of an overall trend of cooling temperature, new primate genes were more likely to emerge during local warming periods, where the warm temperature more closely resembled the environmental condition that preceded the cooling trend. Results also indicate that both primate de novo genes and human cancer-associated genes have later origins in comparison to random human genes. Future studies can be in-depth on understanding human de novo gene emergence from an environmental perspective as well as understanding species divergence from a gene emergence perspective.
Assuntos
Evolução Molecular , Primatas , Animais , Humanos , Primatas/genética , GenomaRESUMO
The rapid growth of online social media usage in our daily lives has increased the importance of analyzing the dynamics of online social networks. However, the dynamic data of existing online social media platforms are not readily accessible. Hence, there is a necessity to synthesize networks emulating those of online social media for further study. In this work, we propose an epidemiology-inspired and community-based, time-evolving online social network generation algorithm (EpiCNet), to generate a time-evolving sequence of random networks that closely mirror the characteristics of real-world online social networks. Variants of the algorithm can produce both undirected and directed networks to accommodate different user interaction paradigms. EpiCNet utilizes compartmental models inspired by mathematical epidemiology to simulate the flow of individuals into and out of the online social network. It also employs an overlapping community structure to enable more realistic connections between individuals in the network. Furthermore, EpiCNet evolves the community structure and connections in the simulated online social network as a function of time and with an emphasis on the behavior of individuals. EpiCNet is capable of simulating a variety of online social networks by adjusting a set of tunable parameters that specify the individual behavior and the evolution of communities over time. The experimental results show that the network properties of the synthetic time-evolving online social network generated by EpiCNet, such as clustering coefficient, node degree, and diameter, match those of typical real-world online social networks such as Facebook and Twitter.
RESUMO
Antibiotic resistance is of crucial interest to both human and animal medicine. It has been recognized that increased environmental monitoring of antibiotic resistance is needed. Metagenomic DNA sequencing is becoming an attractive method to profile antibiotic resistance genes (ARGs), including a special focus on pathogens. A number of computational pipelines are available and under development to support environmental ARG monitoring; the pipeline we present here is promising for general adoption for the purpose of harmonized global monitoring. Specifically, ARGem is a user-friendly pipeline that provides full-service analysis, from the initial DNA short reads to the final visualization of results. The capture of extensive metadata is also facilitated to support comparability across projects and broader monitoring goals. The ARGem pipeline offers efficient analysis of a modest number of samples along with affordable computational components, though the throughput could be increased through cloud resources, based on the user's configuration. The pipeline components were carefully assessed and selected to satisfy tradeoffs, balancing efficiency and flexibility. It was essential to provide a step to perform short read assembly in a reasonable time frame to ensure accurate annotation of identified ARGs. Comprehensive ARG and mobile genetic element databases are included in ARGem for annotation support. ARGem further includes an expandable set of analysis tools that include statistical and network analysis and supports various useful visualization techniques, including Cytoscape visualization of co-occurrence and correlation networks. The performance and flexibility of the ARGem pipeline is demonstrated with analysis of aquatic metagenomes. The pipeline is freely available at https://github.com/xlxlxlx/ARGem.