Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 97
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 611(7935): 312-319, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36261521

RESUMO

Infectious diseases are among the strongest selective pressures driving human evolution1,2. This includes the single greatest mortality event in recorded history, the first outbreak of the second pandemic of plague, commonly called the Black Death, which was caused by the bacterium Yersinia pestis3. This pandemic devastated Afro-Eurasia, killing up to 30-50% of the population4. To identify loci that may have been under selection during the Black Death, we characterized genetic variation around immune-related genes from 206 ancient DNA extracts, stemming from two different European populations before, during and after the Black Death. Immune loci are strongly enriched for highly differentiated sites relative to a set of non-immune loci, suggesting positive selection. We identify 245 variants that are highly differentiated within the London dataset, four of which were replicated in an independent cohort from Denmark, and represent the strongest candidates for positive selection. The selected allele for one of these variants, rs2549794, is associated with the production of a full-length (versus truncated) ERAP2 transcript, variation in cytokine response to Y. pestis and increased ability to control intracellular Y. pestis in macrophages. Finally, we show that protective variants overlap with alleles that are today associated with increased susceptibility to autoimmune diseases, providing empirical evidence for the role played by past pandemics in shaping present-day susceptibility to disease.


Assuntos
DNA Antigo , Predisposição Genética para Doença , Imunidade , Peste , Seleção Genética , Yersinia pestis , Humanos , Aminopeptidases/genética , Aminopeptidases/imunologia , Peste/genética , Peste/imunologia , Peste/microbiologia , Peste/mortalidade , Yersinia pestis/imunologia , Yersinia pestis/patogenicidade , Seleção Genética/imunologia , Europa (Continente)/epidemiologia , Europa (Continente)/etnologia , Imunidade/genética , Conjuntos de Dados como Assunto , Londres/epidemiologia , Dinamarca/epidemiologia
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38695119

RESUMO

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent. It relies on recent advances in deep learning architectures that employ self-supervised learning in order to leverage the power of enormous amounts of unlabelled data to generate contextual embeddings, which are vector representations for words. These ideas have been applied to protein sequences, producing embedding vectors for protein residues. We propose the E-score between two residues as the cosine similarity between their embedding vector representations. Thorough testing on a wide variety of reference multiple sequence alignments indicate that the alignments produced using the new $E$-score method, especially ProtT5-score, are significantly better than those obtained using BLOSUM matrices. The new method proposes to change the way alignments are computed, with far-reaching implications in all areas of textual data that use sequence similarity. The program to compute alignments based on various $E$-scores is available as a web server at e-score.csd.uwo.ca. The source code is freely available for download from github.com/lucian-ilie/E-score.


Assuntos
Algoritmos , Biologia Computacional , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Software , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Proteínas/química , Proteínas/genética , Aprendizado Profundo , Bases de Dados de Proteínas
3.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38212995

RESUMO

MOTIVATION: Proteins accomplish cellular functions by interacting with each other, which makes the prediction of interaction sites a fundamental problem. As experimental methods are expensive and time consuming, computational prediction of the interaction sites has been studied extensively. Structure-based programs are the most accurate, while the sequence-based ones are much more widely applicable, as the sequences available outnumber the structures by two orders of magnitude. Ideally, we would like a tool that has the quality of the former and the applicability of the latter. RESULTS: We provide here the first solution that achieves these two goals. Our new sequence-based program, Seq-InSite, greatly surpasses the performance of sequence-based models, matching the quality of state-of-the-art structure-based predictors, thus effectively superseding the need for models requiring structure. The predictive power of Seq-InSite is illustrated using an analysis of evolutionary conservation for four protein sequences. AVAILABILITY AND IMPLEMENTATION: Seq-InSite is freely available as a web server at http://seq-insite.csd.uwo.ca/ and as free source code, including trained models and all datasets used for training and testing, at https://github.com/lucian-ilie/Seq-InSite.


Assuntos
Proteínas , Software , Proteínas/química , Sequência de Aminoácidos
4.
PLoS Pathog ; 19(7): e1011538, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37523413

RESUMO

Brucellosis is a disease caused by the bacterium Brucella and typically transmitted through contact with infected ruminants. It is one of the most common chronic zoonotic diseases and of particular interest to public health agencies. Despite its well-known transmission history and characteristic symptoms, we lack a more complete understanding of the evolutionary history of its best-known species-Brucella melitensis. To address this knowledge gap we fortuitously found, sequenced and assembled a high-quality ancient B. melitensis draft genome from the kidney stone of a 14th-century Italian friar. The ancient strain contained fewer core genes than modern B. melitensis isolates, carried a complete complement of virulence genes, and did not contain any indication of significant antimicrobial resistances. The ancient B. melitensis genome fell as a basal sister lineage to a subgroup of B. melitensis strains within the Western Mediterranean phylogenetic group, with a short branch length indicative of its earlier sampling time, along with a similar gene content. By calibrating the molecular clock we suggest that the speciation event between B. melitensis and B. abortus is contemporaneous with the estimated time frame for the domestication of both sheep and goats. These results confirm the existence of the Western Mediterranean clade as a separate group in the 14th CE and suggest that its divergence was due to human and ruminant co-migration.


Assuntos
Brucella melitensis , Brucelose , Humanos , Animais , Ovinos , Brucella melitensis/genética , Brucella abortus/genética , Filogenia , Brucelose/microbiologia , Zoonoses , Cabras
5.
Mol Biol Evol ; 40(4)2023 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-37036379

RESUMO

Low complexity sequences (LCRs) are well known within coding as well as non-coding sequences. A low complexity region within a protein must be encoded by the underlying DNA sequence. Here, we examine the relationship between the entropy of the protein sequence and that of the DNA sequence which encodes it. We show that they are poorly correlated whether starting with a low complexity region within the protein and comparing it to the corresponding sequence in the DNA or by finding a low complexity region within coding DNA and comparing it to the corresponding sequence in the protein. We show this is the case within the proteomes of five model organisms: Homo sapiens, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana. We also report a significant bias against mononucleic codons in LCR encoding sequences. By comparison with simulated proteomes, we show that highly repetitive LCRs may be explained by neutral, slippage-based evolution, but compositionally biased LCRs with cryptic repeats are not. We demonstrate that other biological biases and forces must be acting to create and maintain these LCRs. Uncovering these forces will improve our understanding of protein LCR evolution.


Assuntos
Drosophila melanogaster , Proteoma , Animais , Drosophila melanogaster/genética , DNA , Sequência de Aminoácidos , Saccharomyces cerevisiae/genética
6.
J Mol Evol ; 92(2): 153-168, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38485789

RESUMO

Protein Protein low complexity regions (LCRs) are compositionally biased amino acid sequences, many of which have significant evolutionary impacts on the proteins which contain them. They are mutationally unstable experiencing higher rates of indels and substitutions than higher complexity regions. LCRs also impact the expression of their proteins, likely through multiple effects along the path from gene transcription, through translation, and eventual protein degradation. It has been observed that proteins which contain LCRs are associated with elevated transcript abundance (TAb), despite having lower protein abundance. We have gathered and integrated human data to investigate the co-evolution of TAb and LCRs through ancestral reconstructions and model inference using an approximate Bayesian calculation based method. We observe that on short evolutionary timescales TAb evolution is significantly impacted by changes in LCR length, with insertions driving TAb down. But in contrast, the observed data is best explained by indel rates in LCRs which are unaffected by shifts in TAb. Our work demonstrates a coupling between LCR and TAb evolution, and the utility of incorporating multiple responses into evolutionary analyses.


Assuntos
Evolução Molecular , Proteínas , Humanos , Teorema de Bayes , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Domínios Proteicos
7.
Mol Biol Evol ; 39(5)2022 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-35482425

RESUMO

Low Complexity Regions (LCRs) are present in a surprisingly large number of eukaryotic proteins. These highly repetitive and compositionally biased sequences are often structurally disordered, bind promiscuously, and evolve rapidly. Frequently studied in terms of evolutionary dynamics, little is known about how LCRs affect the expression of the proteins which contain them. It would be expected that rapidly evolving LCRs are unlikely to be tolerated in strongly conserved, highly abundant proteins, leading to lower overall abundance in proteins which contain LCRs. To test this hypothesis and examine the associations of protein abundance and transcript abundance with the presence of LCRs, we have integrated high-throughput data from across mammals. We have found that LCRs are indeed associated with reduced protein abundance, but are also associated with elevated transcript abundance. These associations are qualitatively consistent across 12 human tissues and nine mammalian species. The differential impacts of LCRs on abundance at the protein and transcript level are not explained by differences in either protein degradation rates or the inefficiency of translation for LCR containing proteins. We suggest that rapidly evolving LCRs are a source of selective pressure on the regulatory mechanisms which maintain steady-state protein abundance levels.


Assuntos
Evolução Molecular , Proteínas , Animais , Humanos , Mamíferos/genética , Domínios Proteicos , Proteínas/genética
8.
BMC Bioinformatics ; 23(1): 110, 2022 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-35361114

RESUMO

BACKGROUND: Identification of biomarkers, which are measurable characteristics of biological datasets, can be challenging. Although amplicon sequence variants (ASVs) can be considered potential biomarkers, identifying important ASVs in high-throughput sequencing datasets is challenging. Noise, algorithmic failures to account for specific distributional properties, and feature interactions can complicate the discovery of ASV biomarkers. In addition, these issues can impact the replicability of various models and elevate false-discovery rates. Contemporary machine learning approaches can be leveraged to address these issues. Ensembles of decision trees are particularly effective at classifying the types of data commonly generated in high-throughput sequencing (HTS) studies due to their robustness when the number of features in the training data is orders of magnitude larger than the number of samples. In addition, when combined with appropriate model introspection algorithms, machine learning algorithms can also be used to discover and select potential biomarkers. However, the construction of these models could introduce various biases which potentially obfuscate feature discovery. RESULTS: We developed a decision tree ensemble, LANDMark, which uses oblique and non-linear cuts at each node. In synthetic and toy tests LANDMark consistently ranked as the best classifier and often outperformed the Random Forest classifier. When trained on the full metabarcoding dataset obtained from Canada's Wood Buffalo National Park, LANDMark was able to create highly predictive models and achieved an overall balanced accuracy score of 0.96 ± 0.06. The use of recursive feature elimination did not impact LANDMark's generalization performance and, when trained on data from the BE amplicon, it was able to outperform the Linear Support Vector Machine, Logistic Regression models, and Stochastic Gradient Descent models (p ≤ 0.05). Finally, LANDMark distinguishes itself due to its ability to learn smoother non-linear decision boundaries. CONCLUSIONS: Our work introduces LANDMark, a meta-classifier which blends the characteristics of several machine learning models into a decision tree and ensemble learning framework. To our knowledge, this is the first study to apply this type of ensemble approach to amplicon sequencing data and we have shown that analyzing these datasets using LANDMark can produce highly predictive and consistent models.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Biomarcadores , Aprendizado de Máquina , Máquina de Vetores de Suporte
9.
Bioinformatics ; 37(7): 896-904, 2021 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-32840562

RESUMO

MOTIVATION: Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein-protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods. RESULTS: We propose DEep Learning Prediction of Highly probable protein Interaction sites (DELPHI), a new sequence-based deep learning suite for PPI-binding sites prediction. DELPHI has an ensemble structure which combines a CNN and a RNN component with fine tuning technique. Three novel features, HSP, position information and ProtVec are used in addition to nine existing ones. We comprehensively compare DELPHI to nine state-of-the-art programmes on five datasets, and DELPHI outperforms the competing methods in all metrics even though its training dataset shares the least similarities with the testing datasets. In the most important metrics, AUPRC and MCC, it surpasses the second best programmes by as much as 18.5% and 27.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model and, especially, the three new features. Using DELPHI it is shown that there is a strong correlation with protein-binding residues (PBRs) and sites with strong evolutionary conservation. In addition, DELPHI's predicted PBR sites closely match known data from Pfam. DELPHI is available as open-sourced standalone software and web server. AVAILABILITY AND IMPLEMENTATION: The DELPHI web server can be found at delphi.csd.uwo.ca/, with all datasets and results in this study. The trained models, the DELPHI standalone source code, and the feature computation pipeline are freely available at github.com/lucian-ilie/DELPHI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Sítios de Ligação , Biologia Computacional , Ligação Proteica , Proteínas/metabolismo , Projetos de Pesquisa
10.
Genome ; 65(5): 287-299, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35073184

RESUMO

Genomic reorganization, such as rearrangements and inversions, influences how genetic information is organized within the bacterial genomes. Inversions, in particular, facilitate genome evolution through gene gain and loss, and can alter gene expression. Previous studies have investigated the impact inversions have on gene expression induced inversions targeting specific genes or examine inversions between distantly related species. This fails to encompass a genome-wide perspective of naturally occurring inversions and their post-adaptation impact on gene expression. Here, we used bioinformatic techniques and multiple RNA-seq datasets to investigate the short- and long-range impact inversions have on genomic gene expression within Escherichia coli. We observed differences in gene expression between homologous inverted and non-inverted genes even after long-term exposure to adaptive selection. In 4% of inversions representing 33 genes, differential gene expression between inverted and non-inverted homologs was detected, with greater than two-thirds (71%) of differentially expressed inverted genes having 9.4-85.6-fold higher gene expression. The identified inversions had more overlap than expected with nucleoid-associated protein binding sites, which assist in the regulation of genomic gene expression. Some inversions can drastically impact gene expression, even between different strains of E. coli, and could provide a mechanism for the diversification of genetic content through controlled expression changes.


Assuntos
Inversão Cromossômica , Escherichia coli , Escherichia coli/genética , Expressão Gênica , Genoma Bacteriano , Genômica , Humanos , Ligação Proteica
11.
BMC Genomics ; 21(1): 396, 2020 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-32513102

RESUMO

BACKGROUND: The severity and frequency of drought has increased around the globe, creating challenges in ensuring food security for a growing world population. As a consequence, improving water use efficiency by crops has become an important objective for crop improvement. Some wild crop relatives have adapted to extreme osmotic stresses and can provide valuable insights into traits and genetic signatures that can guide efforts to improve crop tolerance to water deficits. Eutrema salsugineum, a close relative of many cruciferous crops, is a halophytic plant and extremophyte model for abiotic stress research. RESULTS: Using comparative transcriptomics, we show that two E. salsugineum ecotypes display significantly different transcriptional responses towards a two-stage drought treatment. Even before visibly wilting, water deficit led to the differential expression of almost 1,100 genes for an ecotype from the semi-arid, sub-arctic Yukon, Canada, but only 63 genes for an ecotype from the semi-tropical, monsoonal, Shandong, China. After recovery and a second drought treatment, about 5,000 differentially expressed genes were detected in Shandong plants versus 1,900 genes in Yukon plants. Only 13 genes displayed similar drought-responsive patterns for both ecotypes. We detected 1,007 long non-protein coding RNAs (lncRNAs), 8% were only expressed in stress-treated plants, a surprising outcome given the documented association between lncRNA expression and stress. Co-expression network analysis of the transcriptomes identified eight gene clusters where at least half of the genes in each cluster were differentially expressed. While many gene clusters were correlated to drought treatments, only a single cluster significantly correlated to drought exposure in both ecotypes. CONCLUSION: Extensive, ecotype-specific transcriptional reprogramming with drought was unexpected given that both ecotypes are adapted to saline habitats providing persistent exposure to osmotic stress. This ecotype-specific response would have escaped notice had we used a single exposure to water deficit. Finally, the apparent capacity to improve tolerance and growth after a drought episode represents an important adaptive trait for a plant that thrives under semi-arid Yukon conditions, and may be similarly advantageous for crop species experiencing stresses attributed to climate change.


Assuntos
Brassicaceae/crescimento & desenvolvimento , Perfilação da Expressão Gênica/métodos , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Brassicaceae/genética , Canadá , Desidratação , Ecótipo , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Folhas de Planta/genética , Folhas de Planta/crescimento & desenvolvimento , RNA de Plantas/genética , Plantas Tolerantes a Sal/genética , Plantas Tolerantes a Sal/crescimento & desenvolvimento , Análise de Sequência de RNA , Estresse Fisiológico
12.
J Mol Evol ; 88(6): 510-520, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32506154

RESUMO

Gene expression in bacteria is a remarkably controlled and intricate process impacted by many factors. One such factor is the genomic position of a gene within a bacterial genome. Genes located near the origin of replication generally have a higher expression level, increased dosage, and are often more conserved than genes located farther from the origin of replication. The majority of the studies involved with these findings have only noted this phenomenon in a single gene or cluster of genes that was re-located to pre-determined positions within a bacterial genome. In this work, we look at the overall expression levels from eleven bacterial data sets from Escherichia coli, Bacillus subtilis, Streptomyces, and Sinorhizobium meliloti. We have confirmed that gene expression tends to decrease when moving away from the origin of replication in majority of the replicons analysed in this study. This study sheds light on the impact of genomic location on molecular trends such as gene expression and highlights the importance of accounting for spatial trends in bacterial molecular analysis.


Assuntos
Expressão Gênica , Genoma Bacteriano , Bacillus subtilis/genética , Escherichia coli/genética , Origem de Replicação , Sinorhizobium meliloti/genética , Streptomyces/genética
13.
Mol Ecol ; 29(15): 2793-2809, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32567754

RESUMO

Parallel evolution can occur through selection on novel mutations, standing genetic variation or adaptive introgression. Uncovering parallelism and introgressed populations can complicate management of threatened species as parallelism may have influenced conservation unit designations and admixed populations are not generally considered under legislations. We examined high coverage whole-genome sequences of 30 caribou (Rangifer tarandus) from across North America and Greenland, representing divergent intraspecific lineages, to investigate parallelism and levels of introgression contributing to the formation of ecotypes. Caribou are split into four subspecies and 11 extant conservation units, known as designatable units (DUs), in Canada. Using genomes from all four subspecies and six DUs, we undertake demographic reconstruction and confirm two previously inferred instances of parallel evolution in the woodland subspecies and uncover an additional instance of parallelism of the eastern migratory ecotype. Detailed investigations reveal introgression in the woodland subspecies, with introgressed regions found spread throughout the genomes encompassing both neutral and functional sites. Our investigations using whole genomes highlight the difficulties in unequivocally demonstrating parallelism through adaptive introgression in nonmodel species with complex demographic histories, with standing variation and introgression both potentially involved. Additionally, the impact of parallelism and introgression on conservation policy for management units needs to be considered in general, and the caribou designations will need amending in light of our results. Uncovering and decoupling parallelism and differential patterns of introgression will become prevalent with the availability of comprehensive genomic data from nonmodel species, and we highlight the need to incorporate this into conservation unit designations.


Assuntos
Ecótipo , Genética Populacional , Canadá , Groenlândia , América do Norte
14.
Mol Biol Evol ; 35(2): 440-450, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29165618

RESUMO

Macrophage Receptor with COllagenous structure (MARCO) is a class A scavenger receptor that binds, phagocytoses, and modifies inflammatory responses to bacterial pathogens. Multiple candidate gene approach studies have shown that polymorphisms in MARCO are associated with susceptibility or resistance to Mycobacterium tuberculosis infection, but how these variants alter function is not known. To complement candidate gene approach studies, we previously used phylogenetic analyses to identify a residue, glutamine 452 (Q452), within the ligand-binding Scavenger Receptor Cysteine Rich domain as undergoing positive selection in humans. Herein, we show that Q452 is found in Denisovans, Neanderthals, and extant humans, but all other nonprimate, terrestrial, and aquatic mammals possess an aspartic acid (D452) residue. Further analysis of hominoid sequences of MARCO identified an additional human-specific mutation, phenylalanine 282 (F282), within the collagenous domain. We show that residue 282 is polymorphic in humans, but only 17% of individuals (rs6761637) possess the ancestral serine residue at position 282. We show that rs6761637 is in linkage disequilibrium with MARCO polymorphisms that have been previously linked to susceptibility to pulmonary tuberculosis. To assess the functional importance of sites Q452 and F282 in humans, we cloned the ancestral residues and loss-of-function mutations and investigated the role of these residues in binding and internalizing polystyrene microspheres and Escherichia coli. Herein, we show that the residues at sites 452 and 282 enhance receptor function.


Assuntos
Fagocitose/genética , Receptores Imunológicos/genética , Seleção Genética , Animais , Células HEK293 , Humanos , Mutação , Receptores Imunológicos/metabolismo
15.
Genome ; 62(11): 761-768, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31437405

RESUMO

The cumulative reproductive cost of multi-locus selection has been considered to be a potentially limiting factor on the rate of adaptive evolution. In this paper, we show that Haldane's arguments for the accumulation of reproductive costs over multiple loci are valid only for a clonally reproducing population of asexual genotypes. We show that a sexually reproducing population avoids this accumulation of costs. Thus, sex removes a perceived reproductive constraint on the rate of adaptive evolution. The significance of our results is twofold. First, the results demonstrate that adaptation based on multiple genes-such as selection acting on the standing genetic variation-does not entail a huge reproductive cost as suggested by Haldane, provided of course that the population is reproducing sexually. Second, this reduction in the cost of natural selection provides a simple biological explanation for the advantage of sex. Specifically, Haldane's calculations illustrate the evolutionary disadvantage of asexuality; sexual reproduction frees the population from this disadvantage.


Assuntos
Evolução Biológica , Genética Populacional , Modelos Genéticos , Reprodução/genética , Seleção Genética , Animais , Cruzamento , Feminino , Frequência do Gene , Loci Gênicos , Variação Genética , Humanos , Masculino , Densidade Demográfica
16.
Am J Phys Anthropol ; 169(2): 240-252, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30964548

RESUMO

OBJECTIVES: In the 14th century AD, medieval Europe was severely affected by the Great European Famine as well as repeated bouts of disease, including the Black Death, causing major demographic shifts. This high volatility led to increased mobility and migration due to new labor and economic opportunities, as evidenced by documentary and stable isotope data. This study uses ancient DNA (aDNA) isolated from skeletal remains to examine whether evidence for large-scale population movement can be gleaned from the complete mitochondrial genomes of 264 medieval individuals from England (London) and Denmark. MATERIALS AND METHODS: Using a novel library-conserving approach to targeted capture, we recovered 264 full mitochondrial genomes from the petrous portion of the temporal bones and teeth and compared genetic diversity across the medieval period within and between English (London) and Danish populations and with contemporary populations through population pairwise ΦST analysis. RESULTS: We find no evidence of significant differences in genetic diversity spatially or temporally in our dataset, yet there is a high degree of haplotype diversity in our medieval samples with little exact sequence sharing. DISCUSSION: The mitochondrial genomes of both medieval Londoners and medieval Danes suggest high mitochondrial diversity before, during and after the Black Death. While our mitochondrial genomic data lack geographically correlated signals, these data could be the result of high, continual female migration before and after the Black Death or may simply indicate a large female effective population size unaffected by the upheaval of the medieval period. Either scenario suggests a genetic resiliency in areas of northwestern medieval Europe.


Assuntos
Variação Genética/genética , Genoma Mitocondrial/genética , Peste/história , Osso e Ossos/química , DNA Antigo/análise , DNA Mitocondrial/análise , Dinamarca , Feminino , História Medieval , Migração Humana/história , Humanos , Londres , Masculino , Dente/química
17.
BMC Genomics ; 19(1): 316, 2018 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-29720103

RESUMO

BACKGROUND: In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. RESULTS: Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. CONCLUSIONS: This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , RNA Longo não Codificante/genética , Fases de Leitura Aberta/genética , RNA de Plantas/genética , Processos Estocásticos
18.
J Evol Biol ; 31(12): 1945-1958, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30341989

RESUMO

Whole genome duplication (WGD), the doubling of the nuclear DNA of a species, contributes to biological innovation by creating genetic redundancy. One mode of WGD is allopolyploidization, wherein each genome from two ancestral species becomes a 'subgenome' of a polyploid descendant species. The evolutionary trajectory of a duplicated gene that arises from WGD is influenced both by natural selection that may favour redundant, new or partitioned functions, and by gene silencing (pseudogenization). Here, we explored how these two phenomena varied over time and within allopolyploid genomes in several allotetraploid clawed frog species (Xenopus). Our analysis demonstrates that, across these polyploid genomes, purifying selection was greatly relaxed compared to a diploid outgroup, was asymmetric between each subgenome, and that coding regions are shorter in the subgenome with more relaxed purifying selection. As well, we found that the rate of gene loss was higher in the subgenome under weaker purifying selection and that this rate has remained relatively consistent over time after WGD. Our findings provide perspective from recently evolved vertebrates on the evolutionary forces that likely shape allopolyploid genomes on other branches of the tree of life.


Assuntos
Evolução Molecular , Poliploidia , Xenopus/genética , Animais , Genoma , Modelos Genéticos , Filogenia , Seleção Genética , Fatores de Tempo
19.
J Theor Biol ; 442: 123-128, 2018 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-29355539

RESUMO

Natural selection can act at many loci across the genome. But as the number of polymorphic loci increases linearly, the number of possible genotypic combinations increases exponentially. Consequently, a finite population - even a very large population - contains only a small sample of all possible multi-locus genotypes. In this paper, we revisit the classic Fisher-Muller models of recombination, taking into account the abundant standing variation that is commonly seen in natural populations. We show that the generation of new genotypic combinations through recombination is an important component of adaptive evolution based on multi-locus selection. Specifically, high-fitness genotypes are expected to be absent from the initial population when the frequencies of favorable alleles at the selected loci are low. But as the allele frequencies rise in response to selection the missing genotypes will be generated by recombination. Given recombination, if the average frequency of the favored alleles at the various selected loci is equal to p, then the expected number of favorable alleles per chromosome will be equal to pL, where L is the number of loci. As the value of p approaches unity at the selected loci, the number of favorable alleles per chromosome will approach a value of L, i.e., at the end of the selection process a favorable allele will be found at all loci. In the absence of recombination, however, selection will be limited to the highest-fitness genotypes that are already present in the initial population. We point out that the fitness of such initial genotypes is far less than the theoretical maximum fitness because they contain a favorable allele at only a fraction of the loci. Consequently, recombination acts to unblock the adaptive response to multi-locus selection in finite populations. Using simulations, we show that the sexual population can withstand invasion by newly-arising asexual clones. These results help explain the maintenance of sexual reproduction in natural populations.


Assuntos
Loci Gênicos/genética , Modelos Genéticos , Recombinação Genética , Seleção Genética , Animais , Evolução Molecular , Feminino , Genética Populacional , Genótipo , Humanos , Desequilíbrio de Ligação , Masculino , Mutação , Reprodução/genética
20.
N Engl J Med ; 370(4): 334-40, 2014 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-24401020

RESUMO

In the 19th century, there were several major cholera pandemics in the Indian subcontinent, Europe, and North America. The causes of these outbreaks and the genomic strain identities remain a mystery. We used targeted high-throughput sequencing to reconstruct the Vibrio cholerae genome from the preserved intestine of a victim of the 1849 cholera outbreak in Philadelphia, part of the second cholera pandemic. This O1 biotype strain has 95 to 97% similarity with the classical O395 genome, differing by 203 single-nucleotide polymorphisms (SNPs), lacking three genomic islands, and probably having one or more tandem cholera toxin prophage (CTX) arrays, which potentially affected its virulence. This result highlights archived medical remains as a potential resource for investigations into the genomic origins of past pandemics.


Assuntos
Cólera/história , Pandemias/história , Vibrio cholerae/genética , Técnicas de Tipagem Bacteriana , Cólera/epidemiologia , Cólera/microbiologia , DNA Bacteriano/isolamento & purificação , DNA Mitocondrial/análise , Evolução Molecular , Genoma Bacteriano , Ilhas Genômicas , História do Século XIX , Humanos , Intestinos/microbiologia , Intestinos/patologia , Masculino , Philadelphia/epidemiologia , Filogenia , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Vibrio cholerae/classificação , Vibrio cholerae/patogenicidade , Virulência , Fatores de Virulência/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA