Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 85
Filter
Add more filters

Country/Region as subject
Publication year range
1.
PLoS Genet ; 16(8): e1008981, 2020 08.
Article in English | MEDLINE | ID: mdl-32745133

ABSTRACT

Tribbles homolog 3 (TRIB3) is pseudokinase involved in intracellular regulatory processes and has been implicated in several diseases. In this article, we report that human TRIB3 promoter contains a 33-bp variable number tandem repeat (VNTR) and characterize the heterogeneity and function of this genetic element. Analysis of human populations around the world uncovered the existence of alleles ranging from 1 to 5 copies of the repeat, with 2-, 3- and 5-copy alleles being the most common but displaying considerable geographical differences in frequency. The repeated sequence overlaps a C/EBP-ATF transcriptional regulatory element and is highly conserved, but not repeated, in various mammalian species, including great apes. The repeat is however evident in Neanderthal and Denisovan genomes. Reporter plasmid experiments in human cell culture reveal that an increased copy number of the TRIB3 promoter 33-bp repeat results in increased transcriptional activity. In line with this, analysis of whole genome sequencing and RNA-Seq data from human cohorts demonstrates that the copy number of TRIB3 promoter 33-bp repeats is positively correlated with TRIB3 mRNA expression level in many tissues throughout the body. Moreover, the copy number of the TRIB3 33-bp repeat appears to be linked to known TRIB3 eQTL SNPs as well as TRIB3 SNPs reported in genetic association studies. Taken together, the results indicate that the promoter 33-bp VNTR constitutes a causal variant for TRIB3 expression variation between individuals and could underlie the results of SNP-based genetic studies.


Subject(s)
Cell Cycle Proteins/genetics , Genetic Heterogeneity , Genetics, Population , Minisatellite Repeats/genetics , Protein Serine-Threonine Kinases/antagonists & inhibitors , Repressor Proteins/genetics , Estonia/epidemiology , Female , Gene Expression Regulation/genetics , Genotype , Humans , Male , Promoter Regions, Genetic , Protein Serine-Threonine Kinases/genetics , RNA-Seq , Whole Genome Sequencing
2.
Hum Mutat ; 42(6): 777-786, 2021 06.
Article in English | MEDLINE | ID: mdl-33715282

ABSTRACT

KATK is a fast and accurate software tool for calling variants directly from raw next-generation sequencing reads. It uses predefined k-mers to retrieve only the reads of interest from the FASTQ file and calls genotypes by aligning retrieved reads locally. KATK does not use data about known polymorphisms and has NC (no call) as the default genotype. The reference or variant allele is called only if there is sufficient evidence for their presence in data. Thus it is not biased against rare variants or de-novo mutations. With simulated datasets, we achieved a false-negative rate of 0.23% (sensitivity 99.77%) and a false discovery rate of 0.19%. Calling all human exonic regions with KATK requires 1-2 h, depending on sequencing coverage.


Subject(s)
DNA Mutational Analysis/methods , High-Throughput Nucleotide Sequencing/methods , Software , Algorithms , Alleles , Chromosome Mapping/methods , Datasets as Topic , Female , Genome, Human , Genotype , Humans , Male , Polymorphism, Single Nucleotide , Reproducibility of Results , Sequence Analysis, DNA/methods
3.
Bioinformatics ; 34(11): 1937-1938, 2018 06 01.
Article in English | MEDLINE | ID: mdl-29360956

ABSTRACT

Summary: Designing PCR primers for amplifying regions of eukaryotic genomes is a complicated task because the genomes contain a large number of repeat sequences and other regions unsuitable for amplification by PCR. We have developed a novel k-mer based masking method that uses a statistical model to detect and mask failure-prone regions on the DNA template prior to primer design. We implemented the software as a standalone software primer3_masker and integrated it into the primer design program Primer3. Availability and implementation: The standalone version of primer3_masker is implemented in C. The source code is freely available at https://github.com/bioinfo-ut/primer3_masker/ (standalone version for Linux and macOS) and at https://github.com/primer3-org/primer3/ (integrated version). Primer3 web application that allows masking sequences of 196 animal and plant genomes is available at http://primer3.ut.ee/. Contact: maido.remm@ut.ee. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
DNA Primers , Polymerase Chain Reaction/methods , Repetitive Sequences, Nucleic Acid , Software , Animals , Humans , Plants/genetics
4.
PLoS Biol ; 14(12): e2000322, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27923039

ABSTRACT

Plant gas exchange is regulated by guard cells that form stomatal pores. Stomatal adjustments are crucial for plant survival; they regulate uptake of CO2 for photosynthesis, loss of water, and entrance of air pollutants such as ozone. We mapped ozone hypersensitivity, more open stomata, and stomatal CO2-insensitivity phenotypes of the Arabidopsis thaliana accession Cvi-0 to a single amino acid substitution in MITOGEN-ACTIVATED PROTEIN (MAP) KINASE 12 (MPK12). In parallel, we showed that stomatal CO2-insensitivity phenotypes of a mutant cis (CO2-insensitive) were caused by a deletion of MPK12. Lack of MPK12 impaired bicarbonate-induced activation of S-type anion channels. We demonstrated that MPK12 interacted with the protein kinase HIGH LEAF TEMPERATURE 1 (HT1)-a central node in guard cell CO2 signaling-and that MPK12 functions as an inhibitor of HT1. These data provide a new function for plant MPKs as protein kinase inhibitors and suggest a mechanism through which guard cell CO2 signaling controls plant water management.


Subject(s)
Arabidopsis Proteins/metabolism , Arabidopsis/physiology , Carbon Dioxide/metabolism , Genetic Variation , Mitogen-Activated Protein Kinases/metabolism , Signal Transduction , Arabidopsis/genetics , Arabidopsis Proteins/genetics , Chromosome Mapping , Ozone/metabolism , Photosynthesis , Quantitative Trait Loci , Water
5.
PLoS Comput Biol ; 14(10): e1006434, 2018 10.
Article in English | MEDLINE | ID: mdl-30346947

ABSTRACT

We have developed an easy-to-use and memory-efficient method called PhenotypeSeeker that (a) identifies phenotype-specific k-mers, (b) generates a k-mer-based statistical model for predicting a given phenotype and (c) predicts the phenotype from the sequencing data of a given bacterial isolate. The method was validated on 167 Klebsiella pneumoniae isolates (virulence), 200 Pseudomonas aeruginosa isolates (ciprofloxacin resistance) and 459 Clostridium difficile isolates (azithromycin resistance). The phenotype prediction models trained from these datasets obtained the F1-measure of 0.88 on the K. pneumoniae test set, 0.88 on the P. aeruginosa test set and 0.97 on the C. difficile test set. The F1-measures were the same for assembled sequences and raw sequencing data; however, building the model from assembled genomes is significantly faster. On these datasets, the model building on a mid-range Linux server takes approximately 3 to 5 hours per phenotype if assembled genomes are used and 10 hours per phenotype if raw sequencing data are used. The phenotype prediction from assembled genomes takes less than one second per isolate. Thus, PhenotypeSeeker should be well-suited for predicting phenotypes from large sequencing datasets. PhenotypeSeeker is implemented in Python programming language, is open-source software and is available at GitHub (https://github.com/bioinfo-ut/PhenotypeSeeker/).


Subject(s)
Algorithms , Bacteria/genetics , DNA, Bacterial/genetics , Genome, Bacterial/genetics , Genomics/methods , Bacteria/metabolism , DNA, Bacterial/physiology , Genetic Markers/genetics , Genome, Bacterial/physiology , Phenotype , Sequence Alignment , Sequence Analysis, DNA , Software
6.
BMC Infect Dis ; 18(1): 513, 2018 Oct 11.
Article in English | MEDLINE | ID: mdl-30309321

ABSTRACT

BACKGROUND: We aimed to identify the main spreading clones, describe the resistance mechanisms associated with carbapenem- and/or multidrug-resistant P. aeruginosa and characterize patients at risk of acquiring these strains in Estonian hospitals. METHODS: Ninety-two non-duplicated carbapenem- and/or multidrug-resistant P. aeruginosa strains were collected between 27th March 2012 and 30th April 2013. Clinical data of the patients was obtained retrospectively from the medical charts. Clonal relationships of the strains were determined by whole genome sequencing and analyzed by multi-locus sequence typing. The presence of resistance genes and beta-lactamases and their origin was determined. Combined-disk method and PCR was used to evaluate carbapenemase and metallo-beta-lactamase production. RESULTS: Forty-three strains were carbapenem-resistant, 11 were multidrug-resistant and 38 were both carbapenem- and multidrug-resistant. Most strains (54%) were isolated from respiratory secretions and caused an infection (74%). Over half of the patients (57%) were ≥ 65 years old and 85% had ≥1 co-morbidity; 96% had contacts with healthcare and/or had received antimicrobial treatment in the previous 90 days. Clinically relevant beta-lactamases (OXA-101, OXA-2 and GES-5) were found in 12% of strains, 27% of which were located in plasmids. No Ambler class B beta-lactamases were detected. Aminoglycoside modifying enzymes were found in 15% of the strains. OprD was defective in 13% of the strains (all with CR phenotype); carbapenem resistance triggering mutations (F170 L, W277X, S403P) were present in 29% of the strains. Ciprofloxacin resistance correlated well with mutations in topoisomerase genes gyrA (T83I, D87N) and parC (S87 L). Almost all strains (97%) with these mutations showed ciprofloxacin-resistant phenotype. Multi-locus sequence type analysis indicated high diversity at the strain level - 36 different sequence types being detected. Two sequence types (ST108 (n = 23) and ST260 (n = 18)) predominated. Whereas ST108 was associated with localized spread in one hospital and mostly carbapenem-resistant phenotype, ST260 strains occurred in all hospitals, mostly with multi-resistant phenotype and carried different resistance genotype/machinery. CONCLUSIONS: Diverse spread of local rather than international P. aeruginosa strains harboring multiple chromosomal mutations, but not plasmid-mediated Ambler class B beta-lactamases, were found in Estonian hospitals. TRIAL REGISTRATION: This trial was registered retrospectively in ClinicalTrials.gov ( NCT03343119 ).


Subject(s)
Anti-Bacterial Agents/therapeutic use , Drug Resistance, Multiple, Bacterial/genetics , Pseudomonas Infections/drug therapy , Pseudomonas aeruginosa/genetics , Aged , Ciprofloxacin/therapeutic use , DNA, Bacterial/chemistry , DNA, Bacterial/isolation & purification , DNA, Bacterial/metabolism , Disease Outbreaks , Estonia/epidemiology , Female , Hospitals , Humans , Male , Middle Aged , Multilocus Sequence Typing , Pseudomonas Infections/epidemiology , Pseudomonas Infections/microbiology , Pseudomonas aeruginosa/isolation & purification , Retrospective Studies , Whole Genome Sequencing , beta-Lactamases/genetics
7.
Proc Natl Acad Sci U S A ; 111(27): 9804-9, 2014 Jul 08.
Article in English | MEDLINE | ID: mdl-24961372

ABSTRACT

Translation arrest directed by nascent peptides and small cofactors controls expression of important bacterial and eukaryotic genes, including antibiotic resistance genes, activated by binding of macrolide drugs to the ribosome. Previous studies suggested that specific interactions between the nascent peptide and the antibiotic in the ribosomal exit tunnel play a central role in triggering ribosome stalling. However, here we show that macrolides arrest translation of the truncated ErmDL regulatory peptide when the nascent chain is only three amino acids and therefore is too short to be juxtaposed with the antibiotic. Biochemical probing and molecular dynamics simulations of erythromycin-bound ribosomes showed that the antibiotic in the tunnel allosterically alters the properties of the catalytic center, thereby predisposing the ribosome for halting translation of specific sequences. Our findings offer a new view on the role of small cofactors in the mechanism of translation arrest and reveal an allosteric link between the tunnel and the catalytic center of the ribosome.


Subject(s)
Anti-Bacterial Agents/pharmacology , Macrolides/pharmacology , Protein Biosynthesis/drug effects , Ribosomes/drug effects , Allosteric Regulation , Cell-Free System , Molecular Conformation , Molecular Dynamics Simulation , Ribosomes/genetics
8.
Mycorrhiza ; 27(8): 761-773, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28730541

ABSTRACT

The arrival of 454 sequencing represented a major breakthrough by allowing deeper sequencing of environmental samples than was possible with existing Sanger approaches. Illumina MiSeq provides a further increase in sequencing depth but shorter read length compared with 454 sequencing. We explored whether Illumina sequencing improves estimates of arbuscular mycorrhizal (AM) fungal richness in plant root samples, compared with 454 sequencing. We identified AM fungi in root samples by sequencing amplicons of the SSU rRNA gene with 454 and Illumina MiSeq paired-end sequencing. In addition, we sequenced metagenomic DNA without prior PCR amplification. Amplicon-based Illumina sequencing yielded two orders of magnitude higher sequencing depth per sample than 454 sequencing. Initial analysis with minimal quality control recorded five times higher AM fungal richness per sample with Illumina sequencing. Additional quality control of Illumina samples, including restriction of the marker region to the most variable amplicon fragment, revealed AM fungal richness values close to those produced by 454 sequencing. Furthermore, AM fungal richness estimates were not correlated with sequencing depth between 300 and 30,000 reads per sample, suggesting that the lower end of this range is sufficient for adequate description of AM fungal communities. By contrast, metagenomic Illumina sequencing yielded very few AM fungal reads and taxa and was dominated by plant DNA, suggesting that AM fungal DNA is present at prohibitively low abundance in colonised root samples. In conclusion, Illumina MiSeq sequencing yielded higher sequencing depth, but similar richness of AM fungi in root samples, compared with 454 sequencing.


Subject(s)
Biodiversity , DNA, Fungal/genetics , High-Throughput Nucleotide Sequencing/methods , Mycorrhizae/genetics
9.
Antimicrob Agents Chemother ; 60(11): 6933-6936, 2016 11.
Article in English | MEDLINE | ID: mdl-27572412

ABSTRACT

A plasmid carrying the colistin resistance gene mcr-1 was isolated from a pig slurry sample in Estonia. The gene was present on a 33,311-bp plasmid of the IncX4 group. mcr-1 is the only antibiotic resistance gene on the plasmid, with the other genes mainly coding for proteins involved in conjugative DNA transfer (taxA, taxB, taxC, trbM, and the pilX operon). The plasmid pESTMCR was present in three phylogenetically very different Escherichia coli strains, suggesting that it has high potential for horizontal transfer.


Subject(s)
Colistin/pharmacology , Drug Resistance, Bacterial/genetics , Escherichia coli Proteins/genetics , Escherichia coli/drug effects , Escherichia coli/genetics , beta-Lactamases/genetics , Animals , Anti-Bacterial Agents/pharmacology , Drug Resistance, Bacterial/drug effects , Escherichia coli/isolation & purification , Estonia , Farms , Female , Manure/microbiology , Microbial Sensitivity Tests , Plasmids/genetics , Swine/microbiology
10.
Hum Mutat ; 35(8): 972-82, 2014 Aug.
Article in English | MEDLINE | ID: mdl-24827138

ABSTRACT

Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs and identify common rearrangements modulating risk to RM. Genome-wide screening of Estonian RM patients and fertile controls identified excessive cumulative burden of CNVs (5.4 and 6.1 Mb per genome) in two RM cases possibly increasing their individual disease risk. Functional profiling of all rearranged genes within RM study group revealed significant enrichment of loci related to innate immunity and immunoregulatory pathways essential for immune tolerance at fetomaternal interface. As a major finding, we report a multicopy duplication (61.6 kb) at 5p13.3 conferring increased maternal risk to RM in Estonia and Denmark (meta-analysis, n = 309/205, odds ratio = 4.82, P = 0.012). Comparison to Estonian population-based cohort (total, n = 1000) confirmed the risk for Estonian female cases (P = 7.9 × 10(-4) ). Datasets of four cohorts from the Database of Genomic Variants (total, n = 5,846 subjects) exhibited similar low duplication prevalence worldwide (0.7%-1.2%) compared to RM cases of this study (6.6%-7.5%). The CNV disrupts PDZD2 and GOLPH3 genes predominantly expressed in placenta and it may represent a novel risk factor for pregnancy complications.


Subject(s)
Abortion, Habitual/genetics , Adaptor Proteins, Signal Transducing/genetics , DNA Copy Number Variations , Genome, Human , Membrane Proteins/genetics , Neoplasm Proteins/genetics , Abortion, Habitual/pathology , Base Sequence , Cell Adhesion Molecules , Chromosome Duplication , Databases, Genetic , Denmark , Estonia , Female , Fetus , Genetic Loci , Genetic Predisposition to Disease , Humans , Immune Tolerance/genetics , Molecular Sequence Data , Oligonucleotide Array Sequence Analysis , Placenta/metabolism , Placenta/pathology , Polymorphism, Single Nucleotide , Pregnancy , Risk Factors
11.
Biochim Biophys Acta ; 1834(4): 717-24, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23352837

ABSTRACT

Classified into 16 superfamilies, conopeptides are the main component of cone snail venoms that attract growing interest in pharmacology and drug discovery. The conventional approach to assigning a conopeptide to a superfamily is based on a consensus signal peptide of the precursor sequence. While this information is available at the genomic or transcriptomic levels, it is not present in amino acid sequences of mature bioactives generated by proteomic studies. As the number of conopeptide sequences is increasing exponentially with the improvement in sequencing techniques, there is a growing need for automating superfamily elucidation. To face this challenge we have defined distinct models of the signal sequence, propeptide region and mature peptides for each of the superfamilies containing more than 5 members (14 out of 16). These models rely on two robust techniques namely, Position-Specific Scoring Matrices (PSSM, also named generalized profiles) and hidden Markov models (HMM). A total of 50 PSSMs and 47 HMM profiles were generated. We confirm that propeptide and mature regions can be used to efficiently classify conopeptides lacking a signal sequence. Furthermore, the combination of all three-region models demonstrated improvement in the classification rates and results emphasise how PSSM and HMM approaches complement each other for superfamily determination. The 97 models were validated and offer a straightforward method applicable to large sequence datasets.


Subject(s)
Amino Acids , Conus Snail , Peptides , Sequence Analysis, Protein , Amino Acids/genetics , Amino Acids/metabolism , Animals , Computational Biology , Conus Snail/chemistry , Conus Snail/genetics , Markov Chains , Peptides/classification , Peptides/genetics , Peptides/metabolism , Venoms/chemistry
12.
Am J Hum Genet ; 89(6): 731-44, 2011 Dec 09.
Article in English | MEDLINE | ID: mdl-22152676

ABSTRACT

South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.


Subject(s)
Genome-Wide Association Study , Selection, Genetic , Asia , Diabetes Mellitus, Type 2/genetics , Genetic Predisposition to Disease , Haplotypes , Heredity , Humans , Lipid Metabolism/genetics , Models, Genetic , Phylogeography , Polymorphism, Single Nucleotide , Principal Component Analysis
13.
Nucleic Acids Res ; 40(Web Server issue): W238-41, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22661581

ABSTRACT

ConoDictor is a tool that enables fast and accurate classification of conopeptides into superfamilies based on their amino acid sequence. ConoDictor combines predictions from two complementary approaches-profile hidden Markov models and generalized profiles. Results appear in a browser as tables that can be downloaded in various formats. This application is particularly valuable in view of the exponentially increasing number of conopeptides that are being identified. ConoDictor was written in Perl using the common gateway interface module with a php submission page. Sequence matching is performed with hmmsearch from HMMER 3 and ps_scan.pl from the pftools 2.3 package. ConoDictor is freely accessible at http://conco.ebc.ee.


Subject(s)
Conotoxins/classification , Software , Conotoxins/chemistry , Internet , Markov Chains , Sequence Analysis, Protein , User-Computer Interface
14.
Nucleic Acids Res ; 40(15): e115, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22730293

ABSTRACT

Polymerase chain reaction (PCR) is a basic molecular biology technique with a multiplicity of uses, including deoxyribonucleic acid cloning and sequencing, functional analysis of genes, diagnosis of diseases, genotyping and discovery of genetic variants. Reliable primer design is crucial for successful PCR, and for over a decade, the open-source Primer3 software has been widely used for primer design, often in high-throughput genomics applications. It has also been incorporated into numerous publicly available software packages and web services. During this period, we have greatly expanded Primer3's functionality. In this article, we describe Primer3's current capabilities, emphasizing recent improvements. The most notable enhancements incorporate more accurate thermodynamic models in the primer design process, both to improve melting temperature prediction and to reduce the likelihood that primers will form hairpins or dimers. Additional enhancements include more precise control of primer placement-a change motivated partly by opportunities to use whole-genome sequences to improve primer specificity. We also added features to increase ease of use, including the ability to save and re-use parameter settings and the ability to require that individual primers not be used in more than one primer pair. We have made the core code more modular and provided cleaner programming interfaces to further ease integration with other software. These improvements position Primer3 for continued use with genome-scale data in the decade ahead.


Subject(s)
DNA Primers/chemistry , Polymerase Chain Reaction , Software , Algorithms , Internet , Thermodynamics , User-Computer Interface
15.
Biochim Biophys Acta ; 1824(3): 488-92, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22244925

ABSTRACT

Conopeptides are small toxins produced by predatory marine snails of the genus Conus. They are studied with increasing intensity due to their potential in neurosciences and pharmacology. The number of existing conopeptides is estimated to be 1 million, but only about 1000 have been described to date. Thanks to new high-throughput sequencing technologies the number of known conopeptides is likely to increase exponentially in the near future. There is therefore a need for a fast and accurate computational method for identification and classification of the novel conopeptides in large data sets. 62 profile Hidden Markov Models (pHMMs) were built for prediction and classification of all described conopeptide superfamilies and families, based on the different parts of the corresponding protein sequences. These models showed very high specificity in detection of new peptides. 56 out of 62 models do not give a single false positive in a test with the entire UniProtKB/Swiss-Prot protein sequence database. Our study demonstrates the usefulness of mature peptide models for automatic classification with accuracy of 96% for the mature peptide models and 100% for the pro- and signal peptide models. Our conopeptide profile HMMs can be used for finding and annotation of new conopeptides from large datasets generated by transcriptome or genome sequencing. To our knowledge this is the first time this kind of computational method has been applied to predict all known conopeptide superfamilies and some conopeptide families.


Subject(s)
Conotoxins/classification , Conus Snail/chemistry , Neurotoxins/classification , Protein Precursors/classification , Transcriptome , Amino Acid Sequence , Animals , Conotoxins/chemistry , Conotoxins/isolation & purification , Conus Snail/genetics , Databases, Protein , Markov Chains , Molecular Sequence Data , Neurotoxins/chemistry , Neurotoxins/isolation & purification , Phylogeny , Protein Precursors/chemistry , Protein Precursors/isolation & purification , Protein Sorting Signals/physiology , Sequence Analysis, Protein , Terminology as Topic
16.
J Virol ; 86(1): 348-57, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22031941

ABSTRACT

Papillomavirus E2 protein is required for the replication and maintenance of viral genomes and transcriptional regulation of viral genes. E2 functions through sequence-specific binding to 12-bp DNA motifs-E2 binding sites (E2BS)-in the virus genome. Papillomaviruses are able to establish persistent infection in their host and have developed a long-term relationship with the host cell in order to guarantee the propagation of the virus. In this study, we have analyzed the occurrence and functionality of E2BSs in the human genome. Our computational analysis indicates that most E2BSs in the human genome are found in repetitive DNA regions and have G/C-rich spacer sequences. Using a chromatin immunoprecipitation approach, we show that human papillomavirus type 11 (HPV11) E2 interacts with a subset of cellular E2BSs located in active chromatin regions. Two E2 activities, sequence-specific DNA binding and interaction with cellular Brd4 protein, are important for E2 binding to consensus sites. E2 binding to cellular E2BSs has a moderate or no effect on cellular transcription. We suggest that the preference of HPV E2 proteins for E2BSs with A/T-rich spacers, which are present in the viral genomes and underrepresented in the human genome, ensures E2 binding to specific binding sites in the virus genome and may help to prevent extensive and possibly detrimental changes in cellular transcription in response to the viral protein.


Subject(s)
Genome, Human , Human papillomavirus 11/metabolism , Papillomavirus Infections/virology , Viral Proteins/metabolism , Binding Sites , Cell Cycle Proteins , Cell Line , Chromatin/genetics , Chromatin/metabolism , Human papillomavirus 11/chemistry , Human papillomavirus 11/genetics , Humans , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Papillomavirus Infections/genetics , Papillomavirus Infections/metabolism , Protein Binding , Repetitive Sequences, Nucleic Acid , Transcription Factors/genetics , Transcription Factors/metabolism , Viral Proteins/chemistry , Viral Proteins/genetics
17.
Sci Rep ; 13(1): 17765, 2023 10 18.
Article in English | MEDLINE | ID: mdl-37853040

ABSTRACT

Genomes exhibit large regions with segmental copy number variation, many of which include entire genes and are multiallelic. We have developed a computational method GeneToCN that counts the frequencies of gene-specific k-mers in FASTQ files and uses this information to infer copy number of the gene. We validated the copy number predictions for amylase genes (AMY1, AMY2A, AMY2B) using experimental data from digital droplet PCR (ddPCR) on 39 individuals and observed a strong correlation (R = 0.99) between GeneToCN predictions and experimentally determined copy numbers. An additional validation on FCGR3 genes showed a higher concordance for FCGR3A compared to two other methods, but reduced accuracy for FCGR3B. We further tested the method on three different genomic regions (SMN, NPY4R, and LPA Kringle IV-2 domain). Predicted copy number distributions of these genes in a set of 500 individuals from the Estonian Biobank were in good agreement with the previously published studies. In addition, we investigated the possibility to use GeneToCN on sequencing data generated by different technologies by comparing copy number predictions from Illumina, PacBio, and Oxford Nanopore data of the same sample. Despite the differences in variability of k-mer frequencies, all three sequencing technologies give similar predictions with GeneToCN.


Subject(s)
DNA Copy Number Variations , Genome , Humans , DNA Copy Number Variations/genetics , Gene Dosage , Polymerase Chain Reaction/methods , High-Throughput Nucleotide Sequencing
18.
Bioinform Adv ; 3(1): vbad084, 2023.
Article in English | MEDLINE | ID: mdl-37641716

ABSTRACT

Motivation: Accurate estimation of next-generation sequencing depth of coverage is needed for detecting the copy number of repeated elements in the human genome. The common methods for estimating sequencing depth are based on counting the number of reads mapped to the genome or subgenomic regions. Such methods are sensitive to the mapping quality. The presence of contamination or the large deviance of an individual genome from the reference may introduce bias in depth estimation. Results: Here, we present an algorithm and implementation for estimating both the sequencing depth and error rate from unmapped reads using a uniquely filtered k-mer set. On simulated reads with 20× coverage, the margin of error was less than 0.01%. At 0.01× coverage and the presence of 10-fold contamination, the precision was within 2% for depth and within 10% for error rate. Availability and implementation: DOCEST program and database can be downloaded from https://bioinfo.ut.ee/docest/. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

19.
Mol Biol Evol ; 28(2): 1013-24, 2011 Feb.
Article in English | MEDLINE | ID: mdl-20978040

ABSTRACT

The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in south and southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in southeast Asia with a later dispersal to south Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from south Asia. To test the two alternative models, this study combines the analysis of uniparentally inherited markers with 610,000 common single nucleotide polymorphism loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17-28 thousand years ago) in southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and "structure-like" analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterized by two ancestral components-one represented in the pattern of Y chromosomal and EDAR results and the other by mitochondrial DNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from southeast Asia, followed by extensive sex-specific admixture with local Indian populations.


Subject(s)
Emigration and Immigration , Genetic Variation , Genetics, Population , Language , Asia, Southeastern , Chromosomes, Human, Y , DNA, Mitochondrial/genetics , Humans , India
20.
Mol Microbiol ; 80(1): 54-67, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21320180

ABSTRACT

Inhibitors of protein synthesis cause defects in the assembly of ribosomal subunits. In response to treatment with the antibiotics erythromycin or chloramphenicol, precursors of both large and small ribosomal subunits accumulate. We have used a pulse-labelling approach to demonstrate that the accumulating subribosomal particles maturate into functional 70S ribosomes. The protein content of the precursor particles is heterogeneous and does not correspond with known assembly intermediates. Mass spectrometry indicates that production of ribosomal proteins in the presence of the antibiotics correlates with the amounts of the individual ribosomal proteins within the precursor particles. Thus, treatment of cells with chloramphenicol or erythromycin leads to an unbalanced synthesis of ribosomal proteins, providing the explanation for formation of assembly-defective particles. The operons for ribosomal proteins show a characteristic pattern of antibiotic inhibition where synthesis of the first proteins is inhibited weakly but gradually increases for the subsequent proteins in the operon. This phenomenon most likely reflects translational coupling and allows us to identify other putative coupled non-ribosomal operons in the Escherichia coli chromosome.


Subject(s)
Anti-Bacterial Agents/pharmacology , Ribosomal Proteins/metabolism , Ribosomes/drug effects , Ribosomes/metabolism , Chloramphenicol/pharmacology , Erythromycin/pharmacology , Escherichia coli/genetics , Escherichia coli/metabolism , Ribosomal Proteins/genetics , Ribosome Subunits/drug effects , Ribosome Subunits/metabolism , Ribosomes/genetics , Tandem Mass Spectrometry
SELECTION OF CITATIONS
SEARCH DETAIL