Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
BMC Genomics ; 24(1): 330, 2023 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-37322447

RESUMEN

BACKGROUND: Balanophoraceae plastomes are known for their highly condensed and re-arranged nature alongside the most extreme nucleotide compositional bias known to date, culminating in two independent reconfigurations of their genetic code. Currently, a large portion of the Balanophoraceae diversity remains unexplored, hindering, among others, evolutionary pattern recognition. Here, we explored newly sequenced plastomes of Sarcophyte sanguinea and Thonningia sanguinea. The reconstructed plastomes were analyzed using various methods of comparative genomics based on a representative taxon sampling. RESULTS: Sarcophyte, recovered sister to the other sampled Balanophoraceae s. str., has plastomes up to 50% larger than those currently published. Its gene set contains five genes lost in any other species, including matK. Five cis-spliced introns are maintained. In contrast, the Thonningia plastome is similarly reduced to published Balanophoraceae and retains only a single cis-spliced intron. Its protein-coding genes show a more biased codon usage compared to Sarcophyte, with an accumulation of in-frame TAG stop codons. Structural plastome comparison revealed multiple, previously unknown, structural rearrangements within Balanophoraceae. CONCLUSIONS: For the "minimal plastomes" of Thonningia, we propose a genetic code change identical to sister genus Balanophora. Sarcophyte however differs drastically from our current understanding on Balanophoraceae plastomes. With a less-extreme nucleotide composition, there is no evidence for an altered genetic code. Using comparative genomics, we identified a hotspot for plastome reconfiguration in Balanophoraceae. Based on previously published and newly identified structural reconfigurations, we propose an updated model of evolutionary plastome trajectories for Balanophoraceae, illustrating a much greater plastome diversity than previously known.


Asunto(s)
Balanophoraceae , Balanophoraceae/genética , Evolución Molecular , Secuencia de Bases , Evolución Biológica , Nucleótidos , Filogenia
2.
Proc Biol Sci ; 286(1906): 20190831, 2019 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-31288696

RESUMEN

Proper biological interpretation of a phylogeny can sometimes hinge on the placement of key taxa-or fail when such key taxa are not sampled. In this light, we here present the first attempt to investigate (though not conclusively resolve) animal relationships using genome-scale data from all phyla. Results from the site-heterogeneous CAT + GTR model recapitulate many established major clades, and strongly confirm some recent discoveries, such as a monophyletic Lophophorata, and a sister group relationship between Gnathifera and Chaetognatha, raising continued questions on the nature of the spiralian ancestor. We also explore matrix construction with an eye towards testing specific relationships; this approach uniquely recovers support for Panarthropoda, and shows that Lophotrochozoa (a subclade of Spiralia) can be constructed in strongly conflicting ways using different taxon- and/or orthologue sets. Dayhoff-6 recoding sacrifices information, but can also reveal surprising outcomes, e.g. full support for a clade of Lophophorata and Entoprocta + Cycliophora, a clade of Placozoa + Cnidaria, and raising support for Ctenophora as sister group to the remaining Metazoa, in a manner dependent on the gene and/or taxon sampling of the matrix in question. Future work should test the hypothesis that the few remaining uncertainties in animal phylogeny might reflect violations of the various stationarity assumptions used in contemporary inference methods.


Asunto(s)
Genómica , Filogenia , Animales , Clasificación
3.
Brief Bioinform ; 18(3): 451-457, 2017 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-27103098

RESUMEN

Sequence similarity tools like Basic Local Alignment Search Tool (BLAST) are essential components of many functional genetic, genomic, phylogenetic and bioinformatic studies. Many modern analysis pipelines use significant sequence similarity scores (p- or E-values) and the ranked order of BLAST matches to test a wide range of hypotheses concerning homology, orthology, the timing of de novo gene birth/death and gene family expansion/contraction. Despite significant contrary findings, many of these tests still implicitly assume that stronger or higher-ranked E-value scores imply closer phylogenetic relationships between sequences. Here, we demonstrate that even though a general relationship does exist between the phylogenetic distance of two sequences and their E-value, significant and misleading errors occur in both the completeness and the order of results under realistic evolutionary scenarios. These results provide additional details to past evidence showing that studies should avoid drawing direct inferences of evolutionary relatedness from measures of sequence similarity alone, and should instead, where possible, use more rigorous phylogeny-based methods.


Asunto(s)
Filogenia , Biología Computacional , Alineación de Secuencia , Programas Informáticos
4.
Mol Phylogenet Evol ; 135: 270-285, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30822528

RESUMEN

The beetle superfamily Dytiscoidea, placed within the suborder Adephaga, comprises six families. The phylogenetic relationships of these families, whose species are aquatic, remain highly contentious. In particular the monophyly of the geographically disjunct Aspidytidae (China and South Africa) remains unclear. Here we use a phylogenomic approach to demonstrate that Aspidytidae are indeed monophyletic, as we inferred this phylogenetic relationship from analyzing nucleotide sequence data filtered for compositional heterogeneity and from analyzing amino-acid sequence data. Our analyses suggest that Aspidytidae are the sister group of Amphizoidae, although the support for this relationship is not unequivocal. A sister group relationship of Hygrobiidae to a clade comprising Amphizoidae, Aspidytidae, and Dytiscidae is supported by analyses in which model assumptions are violated the least. In general, we find that both concatenation and the applied coalescent method are sensitive to the effect of among-species compositional heterogeneity. Four-cluster likelihood-mapping suggests that despite the substantial size of the dataset and the use of advanced analytical methods, statistical support is weak for the inferred phylogenetic placement of Hygrobiidae. These results indicate that other kinds of data (e.g. genomic meta-characters) are possibly required to resolve the above-specified persisting phylogenetic uncertainties. Our study illustrates various data-driven confounding effects in phylogenetic reconstructions and highlights the need for careful monitoring of model violations prior to phylogenomic analysis.


Asunto(s)
Clasificación , Escarabajos/clasificación , Escarabajos/genética , Genómica , Filogenia , Aminoácidos/genética , Animales , Secuencia de Bases , Codón/genética , Genoma , Funciones de Verosimilitud , Transcriptoma/genética
5.
Proteomics ; 18(21-22): e1800069, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30260558

RESUMEN

Compositionally biased regions (BRs) occur when a few amino-acid types are enriched in a protein segment. There are possibly BR types in the known protein universe that have not been characterized experimentally. The UniProt protein database has been surveyed for evidence of such compositionally ''dark matter''. A ''dark biased region'' (DBR) is defined as a biased region with low probability of being an individual structural domain or intrinsically disordered region. The bias annotation program fLPS is used to generate a list of >13 million BRs, which is then thoroughly filtered for structure and intrinsic disorder. About a third of BRs (31%) has both substantial intrinsic disorder and structure. After filtering, there are ≈0.9 million DBRs (≈7% of the original BRs in ≈1.4% of proteins). These DBRs are hugely enriched in eukaryotes and hugely depleted in bacteria. They tend to be more hydrophobic than other protein regions, but are made of less extreme combinations of hydrophobic/hydrophilic residues. Given varying assumptions, It has been estimated that how many DBRs there might be for the high bias levels examined (with p-values < 1 × 10-06 ), deriving a reasonable range of 0.7-7.2% of proteins having such DBRs. Hypotheses are examined about what such DBRs might be, that is, that they are from un- or undersampled domain/region categories or are unappreciated categories somewhat like existing ones.


Asunto(s)
Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Priones/química , Priones/metabolismo , Análisis de Secuencia de Proteína
6.
BMC Genomics ; 19(1): 885, 2018 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-30526500

RESUMEN

BACKGROUND: Restriction-modification (R-M) systems protect bacteria and archaea from attacks by bacteriophages and archaeal viruses. An R-M system specifically recognizes short sites in foreign DNA and cleaves it, while such sites in the host DNA are protected by methylation. Prokaryotic viruses have developed a number of strategies to overcome this host defense. The simplest anti-restriction strategy is the elimination of recognition sites in the viral genome: no sites, no DNA cleavage. Even a decrease of the number of recognition sites can help a virus to overcome this type of host defense. Recognition site avoidance has been a known anti-restriction strategy of prokaryotic viruses for decades. However, recognition site avoidance has not been systematically studied with the currently available sequence data. We analyzed the complete genomes of almost 4000 prokaryotic viruses with known host species and more than 17,000 restriction endonucleases with known specificities in terms of recognition site avoidance. RESULTS: We observed considerable limitations of recognition site avoidance as an anti-restriction strategy. Namely, the avoidance of recognition sites is specific for dsDNA and ssDNA prokaryotic viruses. Avoidance is much more pronounced in the genomes of non-temperate bacteriophages than in the genomes of temperate ones. Avoidance is not observed for the sites of Type I and Type IIG systems and is very rarely observed for the sites of Type III systems. The vast majority of avoidance cases concern recognition sites of orthodox Type II restriction-modification systems. Even under these constraints, complete or almost complete elimination of sites is observed for approximately one-tenth of viral genomes and a significant under-representation for approximately one-fourth of them. CONCLUSIONS: Avoidance of recognition sites of restriction-modification systems is a widespread but not universal anti-restriction strategy of prokaryotic viruses.


Asunto(s)
Enzimas de Restricción-Modificación del ADN/genética , Células Procariotas/virología , Virus/genética , Bacteriófagos/genética , Composición de Base/genética , Enzimas de Restricción del ADN/metabolismo , Bases de Datos Genéticas , Genoma Viral
7.
BMC Genomics ; 19(1): 799, 2018 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-30400812

RESUMEN

BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. RESULTS: We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. CONCLUSIONS: Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Microbiota , ARN Ribosómico 16S/genética , Teorema de Bayes
8.
BMC Genomics ; 19(1): 528, 2018 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-29996771

RESUMEN

BACKGROUND: Bacterial genomes have characteristic compositional skews, which are differences in nucleotide frequency between the leading and lagging DNA strands across a segment of a genome. It is thought that these strand asymmetries arise as a result of mutational biases and selective constraints, particularly for energy efficiency. Analysis of compositional skews in a diverse set of bacteria provides a comparative context in which mutational and selective environmental constraints can be studied. These analyses typically require finished and well-annotated genomic sequences. RESULTS: We present three novel metrics for examining genome composition skews; all three metrics can be computed for unfinished or partially-annotated genomes. The first two metrics, (dot-skew and cross-skew) depend on sequence and gene annotation of a single genome, while the third metric (residual skew) highlights unusual genomes by subtracting a GC content-based model of a library of genome sequences. We applied these metrics to 7738 available bacterial genomes, including partial drafts, and identified outlier species. A phylogenetically diverse set of these outliers (i.e., Borrelia, Ehrlichia, Kinetoplastibacterium, and Phytoplasma) display similar skew patterns but share lifestyle characteristics, such as intracellularity and biosynthetic dependence on their hosts. CONCLUSIONS: Our novel metrics appear to reflect the effects of biosynthetic constraints and adaptations to life within one or more hosts on genome composition. We provide results for each analyzed genome, software and interactive visualizations at http://db.systemsbiology.net/gestalt/ skew_metrics .


Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Genoma Bacteriano , Acceso a Internet , Modelos Genéticos , Interfaz Usuario-Computador
9.
Mol Phylogenet Evol ; 127: 46-54, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-29684598

RESUMEN

Phylogenetic analyses of conserved core genes have disentangled most of the ancient relationships in Archaea. However, some groups remain debated, like the DPANN, a deep-branching super-phylum composed of nanosized archaea with reduced genomes. Among these, the Nanohaloarchaea require high-salt concentrations for growth. Their discovery in 2012 was significant because they represent, together with Halobacteria (a Class belonging to Euryarchaeota), the only two described lineages of extreme halophilic archaea. The phylogenetic position of Nanohaloarchaea is highly debated, being alternatively proposed as the sister-lineage of Halobacteria or a member of the DPANN super-phylum. Pinpointing the phylogenetic position of extreme halophilic archaea is important to improve our knowledge of the deep evolutionary history of Archaea and the molecular adaptive processes and evolutionary paths that allowed their emergence. Using comparative genomic approaches, we identified 258 markers carrying a reliable phylogenetic signal. By combining strategies limiting the impact of biases on phylogenetic inference, we showed that Nanohaloarchaea and Halobacteria represent two independent lines that derived from two distinct but related methanogen Class II lineages. This implies that adaptation to high salinity emerged twice independently in Archaea and indicates that emergence of Nanohaloarchaea within DPANN in previous studies is likely the consequence of a tree reconstruction artifact, challenging the existence of this super-phylum.


Asunto(s)
Euryarchaeota/clasificación , Filogenia , Salinidad , Teorema de Bayes , Secuencia Conservada , Genes Arqueales , Genómica
10.
Biopolymers ; 106(3): 318-29, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-27037995

RESUMEN

Immunosignaturing is an emerging experimental technique that uses random sequence peptide microarrays to detect antibodies produced by the immune system in response to a particular disease. Two important questions regarding immunosignaturing are "Do microarray peptides that exhibit a strong affinity to a given type of antibodies share common sequence properties?" and "If so, what are those properties?" In this work, three statistical tests designed to detect non-random patterns in the amino acid makeup of a group of microarray peptides are presented. One test detects patterns of significantly biased amino acid usage, whereas the other two detect patterns of significant bias in the biochemical properties. These tests do not require a large number of peptides per group. The tests were applied to analyze 19 groups of peptides identified in immunosignaturing experiments as being specific for antibodies produced in response to various types of cancer and other diseases. The positional distribution of the biochemical properties of the amino acids in these 19 peptide groups was also studied. Remarkably, despite the random nature of the sequence libraries used to design the microarrays, a unique group-specific non-random pattern was identified in the majority of the peptide groups studied. © 2016 Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 318-329, 2016.


Asunto(s)
Anticuerpos/análisis , Modelos Estadísticos , Neoplasias/diagnóstico , Neoplasias/inmunología , Biblioteca de Péptidos , Secuencias de Aminoácidos , Afinidad de Anticuerpos , Humanos , Inmunoensayo/instrumentación , Inmunoensayo/estadística & datos numéricos , Neoplasias/clasificación , Neoplasias/genética , Análisis por Matrices de Proteínas , Unión Proteica
11.
FEMS Yeast Res ; 15(5): fov047, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26071597

RESUMEN

The genomes of many yeast species or strain isolates have now been sequenced with an accelerating momentum that quickly relegates initial data to history, albeit that they are less than two decades old. Today, novel yeast genomes are entirely sequenced for a variety of reasons, often only to identify a few expected genes of specific interest, thus providing a wealth of data, heterogenous in quality and completion but informative about the origin and evolution of this heterogeneous collection of unicellular modern fungi. However, how many scientists fully appreciate the important conceptual and technological roles played by yeasts in the extraordinary development of today's genomics? Novel notions of general significance emerged from the very first eukaryote sequenced, Saccharomyces cerevisiae, and were successively refined and extended over time. Tools with general applications were originally developed with this yeast; and surprises emerged from the results. Here, I have tried to recollect the gradual building up of knowledge as yeast genomics developed, and then briefly summarize our present views about the basic nature of yeast genomes, based on the most recent data.


Asunto(s)
Cromosomas Fúngicos/genética , ADN de Hongos/genética , Genoma Fúngico/genética , Saccharomyces cerevisiae/clasificación , Saccharomyces cerevisiae/genética , Evolución Biológica , Mapeo Cromosómico
12.
Mol Phylogenet Evol ; 75: 103-17, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24583288

RESUMEN

The seminal work of Carl Woese and co-workers has contributed to promote the RNA component of the small subunit of the ribosome (SSU rRNA) as a "gold standard" of modern prokaryotic taxonomy and systematics, and an essential tool to explore microbial diversity. Yet, this marker has a limited resolving power, especially at deep phylogenetic depth and can lead to strongly biased trees. The ever-larger number of available complete genomes now calls for a novel standard dataset of robust protein markers that may complement SSU rRNA. In this respect, concatenation of ribosomal proteins (r-proteins) is being growingly used to reconstruct large-scale prokaryotic phylogenies, but their suitability for systematic and/or taxonomic purposes has not been specifically addressed. Using Proteobacteria as a case study, we show that amino acid and nucleic acid r-protein sequences contain a reliable phylogenetic signal at a wide range of taxonomic depths, which has not been totally blurred by mutational saturation or horizontal gene transfer. The use of accurate evolutionary models and reconstruction methods allows overcoming most tree reconstruction artefacts resulting from compositional biases and/or fast evolutionary rates. The inferred phylogenies allow clarifying the relationships among most proteobacterial orders and families, along with the position of several unclassified lineages, suggesting some possible revisions of the current classification. In addition, we investigate the root of the Proteobacteria by considering the time-variation of nucleic acid composition of r-protein sequences and the information carried by horizontal gene transfers, two approaches that do not require the use of an outgroup and limit tree reconstruction artefacts. Altogether, our analyses indicate that r-proteins may represent a promising standard for prokaryotic taxonomy and systematics.


Asunto(s)
Filogenia , Proteobacteria/clasificación , Proteínas Ribosómicas/genética , Teorema de Bayes , Evolución Biológica , ADN Bacteriano/genética , Epsilonproteobacteria/clasificación , Epsilonproteobacteria/genética , Transferencia de Gen Horizontal , Funciones de Verosimilitud , Modelos Genéticos , Proteobacteria/genética , Subunidades Ribosómicas Pequeñas Bacterianas/genética , Análisis de Secuencia de ADN
13.
F1000Res ; 12: 198, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37082000

RESUMEN

Background: The evolutionary rate of disordered proteins varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of intrinsically disordered regions (IDRs) across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards low complexity regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, low complexity proteins across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of low complexity, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.


Asunto(s)
Genómica , Proteoma , Humanos , Proteoma/genética , Filogenia , Genoma Humano , Sesgo
14.
Front Neurosci ; 16: 895607, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35860292

RESUMEN

Codon usage analysis is a crucial part of molecular characterization and is used to determine the factors affecting the evolution of a gene. The length of a gene is an important parameter that affects the characteristics of the gene, such as codon usage, compositional parameters, and sometimes, its functions. In the present study, we investigated the association of various parameters related to codon usage with the length of genes. Gene expression is affected by nucleotide disproportion. In sixty genes related to neurodegenerative disorders, the G nucleotide was the most abundant and the T nucleotide was the least. The nucleotide T exhibited a significant association with the length of the gene at both the overall compositional level and the first and second codon positions. Codon usage bias (CUB) of these genes was affected by pyrimidine and keto skews. Gene length was found to be significantly correlated with codon bias in neurodegeneration associated genes. In gene segments with lengths below 1,200 bp and above 2,400 bp, CUB was positively associated with length. Relative synonymous CUB, which is another measure of CUB, showed that codons TTA, GTT, GTC, TCA, GGT, and GGA exhibited a positive association with length, whereas codons GTA, AGC, CGT, CGA, and GGG showed a negative association. GC-ending codons were preferred over AT-ending codons. Overall analysis indicated that the association between CUB and length varies depending on the segment size; however, CUB of 1,200-2,000 bp gene segments appeared not affected by gene length. In synopsis, analysis suggests that length of the genes correlates with various imperative molecular signatures including A/T nucleotide disproportion and codon choices. In the present study we additionally evaluated various molecular features and their correlation with different indices of codon usage, like the Codon Adaptation Index (CAI) and Relative Dynonymous Codon Usage (RSCU) of codons. We also considered the impact of gene fragment size on different molecular features in genes related to neurodegeneration. This analysis will aid our understanding of and in potentially modulating gene expression in cases of defective gene functioning in clinical settings.

15.
PeerJ ; 10: e14417, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36415860

RESUMEN

Prions are proteinaceous particles that can propagate an alternative conformation to further copies of the same protein. They have been described in mammals, fungi, bacteria and archaea. Furthermore, across diverse organisms from bacteria to eukaryotes, prion-like proteins that have similar sequence characters are evident. Such prion-like proteins have been linked to pathomechanisms of amyotrophic lateral sclerosis (ALS) in humans, in particular TDP43, FUS, TAF15, EWSR1 and hnRNPA2. Because of the desire to study human disease-linked proteins in model organisms, and to gain insights into the functionally important parts of these proteins and how they have changed across hundreds of millions of years of evolution, we analyzed how the sequence traits of these five proteins have evolved across eukaryotes, including plants and metazoa. We discover that the RNA-binding domain architecture of these proteins is deeply conserved since their emergence. Prion-like regions are also deeply and widely conserved since the origination of the protein families for FUS, TAF15 and EWSR1, and since the last common ancestor of metazoa for TDP43 and hnRNPA2. Prion-like composition is uncommon or weak in any plant orthologs observed, however in TDP43 many plant proteins have equivalent regions rich in other amino acids (namely glycine and tyrosine and/or serine) that may be linked to stress granule recruitment. Deeply conserved low-complexity domains are identified that likely have functional significance.


Asunto(s)
Esclerosis Amiotrófica Lateral , Priones , Animales , Humanos , Esclerosis Amiotrófica Lateral/genética , Priones/genética , Proteínas de Unión al ARN/química , Mamíferos/metabolismo
16.
Viruses ; 14(11)2022 10 29.
Artículo en Inglés | MEDLINE | ID: mdl-36366500

RESUMEN

The principal presumption of phage display biopanning is that the naïve library contains an unbiased repertoire of peptides, and thus, the enriched variants derive from the affinity selection of an entirely random peptide pool. In the current study, we utilized deep sequencing to characterize the widely used Ph.DTM-12 phage display peptide library (New England Biolabs). The next-generation sequencing (NGS) data indicated the presence of stop codons and a high abundance of wild-type clones in the naïve library, which collectively result in a reduced effective size of the library. The analysis of the DNA sequence logo and global and position-specific frequency of amino acids demonstrated significant bias in the nucleotide and amino acid composition of the library inserts. Principal component analysis (PCA) uncovered the existence of four distinct clusters in the naïve library and the investigation of peptide frequency distribution revealed a broad range of unequal abundances for peptides. Taken together, our data provide strong evidence for the notion that the naïve library represents substantial departures from randomness at the nucleotide, amino acid, and peptide levels, though not undergoing any selective pressure for target binding. This non-uniform sequence representation arises from both the M13 phage biology and technical errors of the library construction. Our findings highlight the paramount importance of the qualitative assessment of the naïve phage display libraries prior to biopanning.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Biblioteca de Péptidos , Péptidos/química , Aminoácidos/genética , Nucleótidos
17.
BMC Ecol Evol ; 21(1): 43, 2021 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-33726665

RESUMEN

BACKGROUND: Phylogenomic approaches have great power to reconstruct evolutionary histories, however they rely on multi-step processes in which each stage has the potential to affect the accuracy of the final result. Many studies have empirically tested and established methodology for resolving robust phylogenies, including selecting appropriate evolutionary models, identifying orthologs, or isolating partitions with strong phylogenetic signal. However, few have investigated errors that may be initiated at earlier stages of the analysis. Biases introduced during the generation of the phylogenomic dataset itself could produce downstream effects on analyses of evolutionary history. Transcriptomes are widely used in phylogenomics studies, though there is little understanding of how a poor-quality assembly of these datasets could impact the accuracy of phylogenomic hypotheses. Here we examined how transcriptome assembly quality affects phylogenomic inferences by creating independent datasets from the same input data representing high-quality and low-quality transcriptome assembly outcomes. RESULTS: By studying the performance of phylogenomic datasets derived from alternative high- and low-quality assembly inputs in a controlled experiment, we show that high-quality transcriptomes produce richer phylogenomic datasets with a greater number of unique partitions than low-quality assemblies. High-quality assemblies also give rise to partitions that have lower alignment ambiguity and less compositional bias. In addition, high-quality partitions hold stronger phylogenetic signal than their low-quality transcriptome assembly counterparts in both concatenation- and coalescent-based analyses. CONCLUSIONS: Our findings demonstrate the importance of transcriptome assembly quality in phylogenomic analyses and suggest that a portion of the uncertainty observed in such studies could be alleviated at the assembly stage.


Asunto(s)
Genómica , Transcriptoma , Sesgo , Evolución Biológica , Filogenia
18.
PeerJ ; 9: e12363, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34760378

RESUMEN

Compositionally-biased (CB) regions in biological sequences are enriched for a subset of sequence residue types. These can be shorter regions with a concentrated bias (i.e., those termed 'low-complexity'), or longer regions that have a compositional skew. These regions comprise a prominent class of the uncharacterized 'dark matter' of the protein universe. Here, I report the latest version of the fLPS package for the annotation of CB regions, which includes added consideration of DNA sequences, to label the eight possible biased regions of DNA. In this version, the user is now able to restrict analysis to a specified subset of residue types, and also to filter for previously annotated domains to enable detection of discontinuous CB regions. A 'thorough' option has been added which enables the labelling of subtler biases, typically made from a skew for several residue types. In the output, protein CB regions are now labelled with bias classes reflecting the physico-chemical character of the biasing residues. The fLPS 2.0 package is available from: https://github.com/pmharrison/flps2 or in a Supplemental File of this paper.

19.
Life (Basel) ; 11(7)2021 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-34357035

RESUMEN

Notwithstanding the initial claims of general conservation, mitochondrial genomes are a largely heterogeneous set of organellar chromosomes which displays a bewildering diversity in terms of structure, architecture, gene content, and functionality. The mitochondrial genome is typically described as a single chromosome, yet many examples of multipartite genomes have been found (for example, among sponges and diplonemeans); the mitochondrial genome is typically depicted as circular, yet many linear genomes are known (for example, among jellyfish, alveolates, and apicomplexans); the chromosome is normally said to be "small", yet there is a huge variation between the smallest and the largest known genomes (found, for example, in ctenophores and vascular plants, respectively); even the gene content is highly unconserved, ranging from the 13 oxidative phosphorylation-related enzymatic subunits encoded by animal mitochondria to the wider set of mitochondrial genes found in jakobids. In the present paper, we compile and describe a large database of 27,873 mitochondrial genomes currently available in GenBank, encompassing the whole eukaryotic domain. We discuss the major features of mitochondrial molecular diversity, with special reference to nucleotide composition and compositional biases; moreover, the database is made publicly available for future analyses on the MoZoo Lab GitHub page.

20.
PeerJ ; 8: e9023, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32337108

RESUMEN

Pub1 protein is an important RNA-binding protein functional in stress granule assembly in budding yeast Saccharomyces cerevisiae and, as its co-ortholog Tia1, in humans. It is unique among proteins in evidencing prion-like aggregation in both its yeast and human forms. Previously, we noted that Pub1/Tia1 was the only protein linked to human disease that has prion-like character and and has demonstrated such aggregation in both species. Thus, we were motivated to probe further into the evolution of the Pub1/Tia1 family (and its close relative Nam8 and its orthologs) to gain a picture of how such a protein has evolved over deep evolutionary time since the last common ancestor of eukaryotes. Here, we discover that the prion-like composition of this protein family is deeply conserved across eukaryotes, as is the prion-like composition of its close relative Nam8/Ngr1. A sizeable minority of protein orthologs have multiple prion-like domains within their sequences (6-20% depending on criteria). The number of RNA-binding RRM domains is conserved at three copies over >86% of the Pub1 family (>71% of the Nam8 family), but proteins with just one or two RRM domains occur frequently in some clades, indicating that these are not due to annotation errors. Overall, our results indicate that a basic scaffold comprising three RNA-binding domains and at least one prion-like region has been largely conserved since the last common ancestor of eukaryotes, providing further evidence that prion-like aggregation may be a very ancient and conserved phenomenon for certain specific proteins.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA