Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Genet ; 16(8): e1008981, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32745133

RESUMO

Tribbles homolog 3 (TRIB3) is pseudokinase involved in intracellular regulatory processes and has been implicated in several diseases. In this article, we report that human TRIB3 promoter contains a 33-bp variable number tandem repeat (VNTR) and characterize the heterogeneity and function of this genetic element. Analysis of human populations around the world uncovered the existence of alleles ranging from 1 to 5 copies of the repeat, with 2-, 3- and 5-copy alleles being the most common but displaying considerable geographical differences in frequency. The repeated sequence overlaps a C/EBP-ATF transcriptional regulatory element and is highly conserved, but not repeated, in various mammalian species, including great apes. The repeat is however evident in Neanderthal and Denisovan genomes. Reporter plasmid experiments in human cell culture reveal that an increased copy number of the TRIB3 promoter 33-bp repeat results in increased transcriptional activity. In line with this, analysis of whole genome sequencing and RNA-Seq data from human cohorts demonstrates that the copy number of TRIB3 promoter 33-bp repeats is positively correlated with TRIB3 mRNA expression level in many tissues throughout the body. Moreover, the copy number of the TRIB3 33-bp repeat appears to be linked to known TRIB3 eQTL SNPs as well as TRIB3 SNPs reported in genetic association studies. Taken together, the results indicate that the promoter 33-bp VNTR constitutes a causal variant for TRIB3 expression variation between individuals and could underlie the results of SNP-based genetic studies.


Assuntos
Proteínas de Ciclo Celular/genética , Heterogeneidade Genética , Genética Populacional , Repetições Minissatélites/genética , Proteínas Serina-Treonina Quinases/antagonistas & inibidores , Proteínas Repressoras/genética , Estônia/epidemiologia , Feminino , Regulação da Expressão Gênica/genética , Genótipo , Humanos , Masculino , Regiões Promotoras Genéticas , Proteínas Serina-Treonina Quinases/genética , RNA-Seq , Sequenciamento Completo do Genoma
2.
Hum Mutat ; 42(6): 777-786, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33715282

RESUMO

KATK is a fast and accurate software tool for calling variants directly from raw next-generation sequencing reads. It uses predefined k-mers to retrieve only the reads of interest from the FASTQ file and calls genotypes by aligning retrieved reads locally. KATK does not use data about known polymorphisms and has NC (no call) as the default genotype. The reference or variant allele is called only if there is sufficient evidence for their presence in data. Thus it is not biased against rare variants or de-novo mutations. With simulated datasets, we achieved a false-negative rate of 0.23% (sensitivity 99.77%) and a false discovery rate of 0.19%. Calling all human exonic regions with KATK requires 1-2 h, depending on sequencing coverage.


Assuntos
Análise Mutacional de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Algoritmos , Alelos , Mapeamento Cromossômico/métodos , Conjuntos de Dados como Assunto , Feminino , Genoma Humano , Genótipo , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
3.
Bioinformatics ; 34(11): 1937-1938, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29360956

RESUMO

Summary: Designing PCR primers for amplifying regions of eukaryotic genomes is a complicated task because the genomes contain a large number of repeat sequences and other regions unsuitable for amplification by PCR. We have developed a novel k-mer based masking method that uses a statistical model to detect and mask failure-prone regions on the DNA template prior to primer design. We implemented the software as a standalone software primer3_masker and integrated it into the primer design program Primer3. Availability and implementation: The standalone version of primer3_masker is implemented in C. The source code is freely available at https://github.com/bioinfo-ut/primer3_masker/ (standalone version for Linux and macOS) and at https://github.com/primer3-org/primer3/ (integrated version). Primer3 web application that allows masking sequences of 196 animal and plant genomes is available at http://primer3.ut.ee/. Contact: maido.remm@ut.ee. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Primers do DNA , Reação em Cadeia da Polimerase/métodos , Sequências Repetitivas de Ácido Nucleico , Software , Animais , Humanos , Plantas/genética
4.
PLoS Biol ; 14(12): e2000322, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27923039

RESUMO

Plant gas exchange is regulated by guard cells that form stomatal pores. Stomatal adjustments are crucial for plant survival; they regulate uptake of CO2 for photosynthesis, loss of water, and entrance of air pollutants such as ozone. We mapped ozone hypersensitivity, more open stomata, and stomatal CO2-insensitivity phenotypes of the Arabidopsis thaliana accession Cvi-0 to a single amino acid substitution in MITOGEN-ACTIVATED PROTEIN (MAP) KINASE 12 (MPK12). In parallel, we showed that stomatal CO2-insensitivity phenotypes of a mutant cis (CO2-insensitive) were caused by a deletion of MPK12. Lack of MPK12 impaired bicarbonate-induced activation of S-type anion channels. We demonstrated that MPK12 interacted with the protein kinase HIGH LEAF TEMPERATURE 1 (HT1)-a central node in guard cell CO2 signaling-and that MPK12 functions as an inhibitor of HT1. These data provide a new function for plant MPKs as protein kinase inhibitors and suggest a mechanism through which guard cell CO2 signaling controls plant water management.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/fisiologia , Dióxido de Carbono/metabolismo , Variação Genética , Proteínas Quinases Ativadas por Mitógeno/metabolismo , Transdução de Sinais , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Mapeamento Cromossômico , Ozônio/metabolismo , Fotossíntese , Locos de Características Quantitativas , Água
5.
PLoS Comput Biol ; 14(10): e1006434, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30346947

RESUMO

We have developed an easy-to-use and memory-efficient method called PhenotypeSeeker that (a) identifies phenotype-specific k-mers, (b) generates a k-mer-based statistical model for predicting a given phenotype and (c) predicts the phenotype from the sequencing data of a given bacterial isolate. The method was validated on 167 Klebsiella pneumoniae isolates (virulence), 200 Pseudomonas aeruginosa isolates (ciprofloxacin resistance) and 459 Clostridium difficile isolates (azithromycin resistance). The phenotype prediction models trained from these datasets obtained the F1-measure of 0.88 on the K. pneumoniae test set, 0.88 on the P. aeruginosa test set and 0.97 on the C. difficile test set. The F1-measures were the same for assembled sequences and raw sequencing data; however, building the model from assembled genomes is significantly faster. On these datasets, the model building on a mid-range Linux server takes approximately 3 to 5 hours per phenotype if assembled genomes are used and 10 hours per phenotype if raw sequencing data are used. The phenotype prediction from assembled genomes takes less than one second per isolate. Thus, PhenotypeSeeker should be well-suited for predicting phenotypes from large sequencing datasets. PhenotypeSeeker is implemented in Python programming language, is open-source software and is available at GitHub (https://github.com/bioinfo-ut/PhenotypeSeeker/).


Assuntos
Algoritmos , Bactérias/genética , DNA Bacteriano/genética , Genoma Bacteriano/genética , Genômica/métodos , Bactérias/metabolismo , DNA Bacteriano/fisiologia , Marcadores Genéticos/genética , Genoma Bacteriano/fisiologia , Fenótipo , Alinhamento de Sequência , Análise de Sequência de DNA , Software
6.
BMC Infect Dis ; 18(1): 513, 2018 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-30309321

RESUMO

BACKGROUND: We aimed to identify the main spreading clones, describe the resistance mechanisms associated with carbapenem- and/or multidrug-resistant P. aeruginosa and characterize patients at risk of acquiring these strains in Estonian hospitals. METHODS: Ninety-two non-duplicated carbapenem- and/or multidrug-resistant P. aeruginosa strains were collected between 27th March 2012 and 30th April 2013. Clinical data of the patients was obtained retrospectively from the medical charts. Clonal relationships of the strains were determined by whole genome sequencing and analyzed by multi-locus sequence typing. The presence of resistance genes and beta-lactamases and their origin was determined. Combined-disk method and PCR was used to evaluate carbapenemase and metallo-beta-lactamase production. RESULTS: Forty-three strains were carbapenem-resistant, 11 were multidrug-resistant and 38 were both carbapenem- and multidrug-resistant. Most strains (54%) were isolated from respiratory secretions and caused an infection (74%). Over half of the patients (57%) were ≥ 65 years old and 85% had ≥1 co-morbidity; 96% had contacts with healthcare and/or had received antimicrobial treatment in the previous 90 days. Clinically relevant beta-lactamases (OXA-101, OXA-2 and GES-5) were found in 12% of strains, 27% of which were located in plasmids. No Ambler class B beta-lactamases were detected. Aminoglycoside modifying enzymes were found in 15% of the strains. OprD was defective in 13% of the strains (all with CR phenotype); carbapenem resistance triggering mutations (F170 L, W277X, S403P) were present in 29% of the strains. Ciprofloxacin resistance correlated well with mutations in topoisomerase genes gyrA (T83I, D87N) and parC (S87 L). Almost all strains (97%) with these mutations showed ciprofloxacin-resistant phenotype. Multi-locus sequence type analysis indicated high diversity at the strain level - 36 different sequence types being detected. Two sequence types (ST108 (n = 23) and ST260 (n = 18)) predominated. Whereas ST108 was associated with localized spread in one hospital and mostly carbapenem-resistant phenotype, ST260 strains occurred in all hospitals, mostly with multi-resistant phenotype and carried different resistance genotype/machinery. CONCLUSIONS: Diverse spread of local rather than international P. aeruginosa strains harboring multiple chromosomal mutations, but not plasmid-mediated Ambler class B beta-lactamases, were found in Estonian hospitals. TRIAL REGISTRATION: This trial was registered retrospectively in ClinicalTrials.gov ( NCT03343119 ).


Assuntos
Antibacterianos/uso terapêutico , Farmacorresistência Bacteriana Múltipla/genética , Infecções por Pseudomonas/tratamento farmacológico , Pseudomonas aeruginosa/genética , Idoso , Ciprofloxacina/uso terapêutico , DNA Bacteriano/química , DNA Bacteriano/isolamento & purificação , DNA Bacteriano/metabolismo , Surtos de Doenças , Estônia/epidemiologia , Feminino , Hospitais , Humanos , Masculino , Pessoa de Meia-Idade , Tipagem de Sequências Multilocus , Infecções por Pseudomonas/epidemiologia , Infecções por Pseudomonas/microbiologia , Pseudomonas aeruginosa/isolamento & purificação , Estudos Retrospectivos , Sequenciamento Completo do Genoma , beta-Lactamases/genética
7.
Proc Natl Acad Sci U S A ; 111(27): 9804-9, 2014 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-24961372

RESUMO

Translation arrest directed by nascent peptides and small cofactors controls expression of important bacterial and eukaryotic genes, including antibiotic resistance genes, activated by binding of macrolide drugs to the ribosome. Previous studies suggested that specific interactions between the nascent peptide and the antibiotic in the ribosomal exit tunnel play a central role in triggering ribosome stalling. However, here we show that macrolides arrest translation of the truncated ErmDL regulatory peptide when the nascent chain is only three amino acids and therefore is too short to be juxtaposed with the antibiotic. Biochemical probing and molecular dynamics simulations of erythromycin-bound ribosomes showed that the antibiotic in the tunnel allosterically alters the properties of the catalytic center, thereby predisposing the ribosome for halting translation of specific sequences. Our findings offer a new view on the role of small cofactors in the mechanism of translation arrest and reveal an allosteric link between the tunnel and the catalytic center of the ribosome.


Assuntos
Antibacterianos/farmacologia , Macrolídeos/farmacologia , Biossíntese de Proteínas/efeitos dos fármacos , Ribossomos/efeitos dos fármacos , Regulação Alostérica , Sistema Livre de Células , Conformação Molecular , Simulação de Dinâmica Molecular , Ribossomos/genética
8.
Mycorrhiza ; 27(8): 761-773, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28730541

RESUMO

The arrival of 454 sequencing represented a major breakthrough by allowing deeper sequencing of environmental samples than was possible with existing Sanger approaches. Illumina MiSeq provides a further increase in sequencing depth but shorter read length compared with 454 sequencing. We explored whether Illumina sequencing improves estimates of arbuscular mycorrhizal (AM) fungal richness in plant root samples, compared with 454 sequencing. We identified AM fungi in root samples by sequencing amplicons of the SSU rRNA gene with 454 and Illumina MiSeq paired-end sequencing. In addition, we sequenced metagenomic DNA without prior PCR amplification. Amplicon-based Illumina sequencing yielded two orders of magnitude higher sequencing depth per sample than 454 sequencing. Initial analysis with minimal quality control recorded five times higher AM fungal richness per sample with Illumina sequencing. Additional quality control of Illumina samples, including restriction of the marker region to the most variable amplicon fragment, revealed AM fungal richness values close to those produced by 454 sequencing. Furthermore, AM fungal richness estimates were not correlated with sequencing depth between 300 and 30,000 reads per sample, suggesting that the lower end of this range is sufficient for adequate description of AM fungal communities. By contrast, metagenomic Illumina sequencing yielded very few AM fungal reads and taxa and was dominated by plant DNA, suggesting that AM fungal DNA is present at prohibitively low abundance in colonised root samples. In conclusion, Illumina MiSeq sequencing yielded higher sequencing depth, but similar richness of AM fungi in root samples, compared with 454 sequencing.


Assuntos
Biodiversidade , DNA Fúngico/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Micorrizas/genética
9.
Antimicrob Agents Chemother ; 60(11): 6933-6936, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27572412

RESUMO

A plasmid carrying the colistin resistance gene mcr-1 was isolated from a pig slurry sample in Estonia. The gene was present on a 33,311-bp plasmid of the IncX4 group. mcr-1 is the only antibiotic resistance gene on the plasmid, with the other genes mainly coding for proteins involved in conjugative DNA transfer (taxA, taxB, taxC, trbM, and the pilX operon). The plasmid pESTMCR was present in three phylogenetically very different Escherichia coli strains, suggesting that it has high potential for horizontal transfer.


Assuntos
Colistina/farmacologia , Farmacorresistência Bacteriana/genética , Proteínas de Escherichia coli/genética , Escherichia coli/efeitos dos fármacos , Escherichia coli/genética , beta-Lactamases/genética , Animais , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/efeitos dos fármacos , Escherichia coli/isolamento & purificação , Estônia , Fazendas , Feminino , Esterco/microbiologia , Testes de Sensibilidade Microbiana , Plasmídeos/genética , Suínos/microbiologia
10.
Hum Mutat ; 35(8): 972-82, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24827138

RESUMO

Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs and identify common rearrangements modulating risk to RM. Genome-wide screening of Estonian RM patients and fertile controls identified excessive cumulative burden of CNVs (5.4 and 6.1 Mb per genome) in two RM cases possibly increasing their individual disease risk. Functional profiling of all rearranged genes within RM study group revealed significant enrichment of loci related to innate immunity and immunoregulatory pathways essential for immune tolerance at fetomaternal interface. As a major finding, we report a multicopy duplication (61.6 kb) at 5p13.3 conferring increased maternal risk to RM in Estonia and Denmark (meta-analysis, n = 309/205, odds ratio = 4.82, P = 0.012). Comparison to Estonian population-based cohort (total, n = 1000) confirmed the risk for Estonian female cases (P = 7.9 × 10(-4) ). Datasets of four cohorts from the Database of Genomic Variants (total, n = 5,846 subjects) exhibited similar low duplication prevalence worldwide (0.7%-1.2%) compared to RM cases of this study (6.6%-7.5%). The CNV disrupts PDZD2 and GOLPH3 genes predominantly expressed in placenta and it may represent a novel risk factor for pregnancy complications.


Assuntos
Aborto Habitual/genética , Proteínas Adaptadoras de Transdução de Sinal/genética , Variações do Número de Cópias de DNA , Genoma Humano , Proteínas de Membrana/genética , Proteínas de Neoplasias/genética , Aborto Habitual/patologia , Sequência de Bases , Moléculas de Adesão Celular , Duplicação Cromossômica , Bases de Dados Genéticas , Dinamarca , Estônia , Feminino , Feto , Loci Gênicos , Predisposição Genética para Doença , Humanos , Tolerância Imunológica/genética , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos , Placenta/metabolismo , Placenta/patologia , Polimorfismo de Nucleotídeo Único , Gravidez , Fatores de Risco
11.
Biochim Biophys Acta ; 1834(4): 717-24, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23352837

RESUMO

Classified into 16 superfamilies, conopeptides are the main component of cone snail venoms that attract growing interest in pharmacology and drug discovery. The conventional approach to assigning a conopeptide to a superfamily is based on a consensus signal peptide of the precursor sequence. While this information is available at the genomic or transcriptomic levels, it is not present in amino acid sequences of mature bioactives generated by proteomic studies. As the number of conopeptide sequences is increasing exponentially with the improvement in sequencing techniques, there is a growing need for automating superfamily elucidation. To face this challenge we have defined distinct models of the signal sequence, propeptide region and mature peptides for each of the superfamilies containing more than 5 members (14 out of 16). These models rely on two robust techniques namely, Position-Specific Scoring Matrices (PSSM, also named generalized profiles) and hidden Markov models (HMM). A total of 50 PSSMs and 47 HMM profiles were generated. We confirm that propeptide and mature regions can be used to efficiently classify conopeptides lacking a signal sequence. Furthermore, the combination of all three-region models demonstrated improvement in the classification rates and results emphasise how PSSM and HMM approaches complement each other for superfamily determination. The 97 models were validated and offer a straightforward method applicable to large sequence datasets.


Assuntos
Aminoácidos , Caramujo Conus , Peptídeos , Análise de Sequência de Proteína , Aminoácidos/genética , Aminoácidos/metabolismo , Animais , Biologia Computacional , Caramujo Conus/química , Caramujo Conus/genética , Cadeias de Markov , Peptídeos/classificação , Peptídeos/genética , Peptídeos/metabolismo , Peçonhas/química
12.
Am J Hum Genet ; 89(6): 731-44, 2011 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-22152676

RESUMO

South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.


Assuntos
Estudo de Associação Genômica Ampla , Seleção Genética , Ásia , Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença , Haplótipos , Hereditariedade , Humanos , Metabolismo dos Lipídeos/genética , Modelos Genéticos , Filogeografia , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal
13.
Nucleic Acids Res ; 40(Web Server issue): W238-41, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22661581

RESUMO

ConoDictor is a tool that enables fast and accurate classification of conopeptides into superfamilies based on their amino acid sequence. ConoDictor combines predictions from two complementary approaches-profile hidden Markov models and generalized profiles. Results appear in a browser as tables that can be downloaded in various formats. This application is particularly valuable in view of the exponentially increasing number of conopeptides that are being identified. ConoDictor was written in Perl using the common gateway interface module with a php submission page. Sequence matching is performed with hmmsearch from HMMER 3 and ps_scan.pl from the pftools 2.3 package. ConoDictor is freely accessible at http://conco.ebc.ee.


Assuntos
Conotoxinas/classificação , Software , Conotoxinas/química , Internet , Cadeias de Markov , Análise de Sequência de Proteína , Interface Usuário-Computador
14.
Nucleic Acids Res ; 40(15): e115, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22730293

RESUMO

Polymerase chain reaction (PCR) is a basic molecular biology technique with a multiplicity of uses, including deoxyribonucleic acid cloning and sequencing, functional analysis of genes, diagnosis of diseases, genotyping and discovery of genetic variants. Reliable primer design is crucial for successful PCR, and for over a decade, the open-source Primer3 software has been widely used for primer design, often in high-throughput genomics applications. It has also been incorporated into numerous publicly available software packages and web services. During this period, we have greatly expanded Primer3's functionality. In this article, we describe Primer3's current capabilities, emphasizing recent improvements. The most notable enhancements incorporate more accurate thermodynamic models in the primer design process, both to improve melting temperature prediction and to reduce the likelihood that primers will form hairpins or dimers. Additional enhancements include more precise control of primer placement-a change motivated partly by opportunities to use whole-genome sequences to improve primer specificity. We also added features to increase ease of use, including the ability to save and re-use parameter settings and the ability to require that individual primers not be used in more than one primer pair. We have made the core code more modular and provided cleaner programming interfaces to further ease integration with other software. These improvements position Primer3 for continued use with genome-scale data in the decade ahead.


Assuntos
Primers do DNA/química , Reação em Cadeia da Polimerase , Software , Algoritmos , Internet , Termodinâmica , Interface Usuário-Computador
15.
Biochim Biophys Acta ; 1824(3): 488-92, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22244925

RESUMO

Conopeptides are small toxins produced by predatory marine snails of the genus Conus. They are studied with increasing intensity due to their potential in neurosciences and pharmacology. The number of existing conopeptides is estimated to be 1 million, but only about 1000 have been described to date. Thanks to new high-throughput sequencing technologies the number of known conopeptides is likely to increase exponentially in the near future. There is therefore a need for a fast and accurate computational method for identification and classification of the novel conopeptides in large data sets. 62 profile Hidden Markov Models (pHMMs) were built for prediction and classification of all described conopeptide superfamilies and families, based on the different parts of the corresponding protein sequences. These models showed very high specificity in detection of new peptides. 56 out of 62 models do not give a single false positive in a test with the entire UniProtKB/Swiss-Prot protein sequence database. Our study demonstrates the usefulness of mature peptide models for automatic classification with accuracy of 96% for the mature peptide models and 100% for the pro- and signal peptide models. Our conopeptide profile HMMs can be used for finding and annotation of new conopeptides from large datasets generated by transcriptome or genome sequencing. To our knowledge this is the first time this kind of computational method has been applied to predict all known conopeptide superfamilies and some conopeptide families.


Assuntos
Conotoxinas/classificação , Caramujo Conus/química , Neurotoxinas/classificação , Precursores de Proteínas/classificação , Transcriptoma , Sequência de Aminoácidos , Animais , Conotoxinas/química , Conotoxinas/isolamento & purificação , Caramujo Conus/genética , Bases de Dados de Proteínas , Cadeias de Markov , Dados de Sequência Molecular , Neurotoxinas/química , Neurotoxinas/isolamento & purificação , Filogenia , Precursores de Proteínas/química , Precursores de Proteínas/isolamento & purificação , Sinais Direcionadores de Proteínas/fisiologia , Análise de Sequência de Proteína , Terminologia como Assunto
16.
J Virol ; 86(1): 348-57, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22031941

RESUMO

Papillomavirus E2 protein is required for the replication and maintenance of viral genomes and transcriptional regulation of viral genes. E2 functions through sequence-specific binding to 12-bp DNA motifs-E2 binding sites (E2BS)-in the virus genome. Papillomaviruses are able to establish persistent infection in their host and have developed a long-term relationship with the host cell in order to guarantee the propagation of the virus. In this study, we have analyzed the occurrence and functionality of E2BSs in the human genome. Our computational analysis indicates that most E2BSs in the human genome are found in repetitive DNA regions and have G/C-rich spacer sequences. Using a chromatin immunoprecipitation approach, we show that human papillomavirus type 11 (HPV11) E2 interacts with a subset of cellular E2BSs located in active chromatin regions. Two E2 activities, sequence-specific DNA binding and interaction with cellular Brd4 protein, are important for E2 binding to consensus sites. E2 binding to cellular E2BSs has a moderate or no effect on cellular transcription. We suggest that the preference of HPV E2 proteins for E2BSs with A/T-rich spacers, which are present in the viral genomes and underrepresented in the human genome, ensures E2 binding to specific binding sites in the virus genome and may help to prevent extensive and possibly detrimental changes in cellular transcription in response to the viral protein.


Assuntos
Genoma Humano , Papillomavirus Humano 11/metabolismo , Infecções por Papillomavirus/virologia , Proteínas Virais/metabolismo , Sítios de Ligação , Proteínas de Ciclo Celular , Linhagem Celular , Cromatina/genética , Cromatina/metabolismo , Papillomavirus Humano 11/química , Papillomavirus Humano 11/genética , Humanos , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Infecções por Papillomavirus/genética , Infecções por Papillomavirus/metabolismo , Ligação Proteica , Sequências Repetitivas de Ácido Nucleico , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Proteínas Virais/química , Proteínas Virais/genética
17.
Sci Rep ; 13(1): 17765, 2023 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-37853040

RESUMO

Genomes exhibit large regions with segmental copy number variation, many of which include entire genes and are multiallelic. We have developed a computational method GeneToCN that counts the frequencies of gene-specific k-mers in FASTQ files and uses this information to infer copy number of the gene. We validated the copy number predictions for amylase genes (AMY1, AMY2A, AMY2B) using experimental data from digital droplet PCR (ddPCR) on 39 individuals and observed a strong correlation (R = 0.99) between GeneToCN predictions and experimentally determined copy numbers. An additional validation on FCGR3 genes showed a higher concordance for FCGR3A compared to two other methods, but reduced accuracy for FCGR3B. We further tested the method on three different genomic regions (SMN, NPY4R, and LPA Kringle IV-2 domain). Predicted copy number distributions of these genes in a set of 500 individuals from the Estonian Biobank were in good agreement with the previously published studies. In addition, we investigated the possibility to use GeneToCN on sequencing data generated by different technologies by comparing copy number predictions from Illumina, PacBio, and Oxford Nanopore data of the same sample. Despite the differences in variability of k-mer frequencies, all three sequencing technologies give similar predictions with GeneToCN.


Assuntos
Variações do Número de Cópias de DNA , Genoma , Humanos , Variações do Número de Cópias de DNA/genética , Dosagem de Genes , Reação em Cadeia da Polimerase/métodos , Sequenciamento de Nucleotídeos em Larga Escala
18.
Bioinform Adv ; 3(1): vbad084, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37641716

RESUMO

Motivation: Accurate estimation of next-generation sequencing depth of coverage is needed for detecting the copy number of repeated elements in the human genome. The common methods for estimating sequencing depth are based on counting the number of reads mapped to the genome or subgenomic regions. Such methods are sensitive to the mapping quality. The presence of contamination or the large deviance of an individual genome from the reference may introduce bias in depth estimation. Results: Here, we present an algorithm and implementation for estimating both the sequencing depth and error rate from unmapped reads using a uniquely filtered k-mer set. On simulated reads with 20× coverage, the margin of error was less than 0.01%. At 0.01× coverage and the presence of 10-fold contamination, the precision was within 2% for depth and within 10% for error rate. Availability and implementation: DOCEST program and database can be downloaded from https://bioinfo.ut.ee/docest/. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

19.
Mol Biol Evol ; 28(2): 1013-24, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20978040

RESUMO

The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in south and southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in southeast Asia with a later dispersal to south Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from south Asia. To test the two alternative models, this study combines the analysis of uniparentally inherited markers with 610,000 common single nucleotide polymorphism loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17-28 thousand years ago) in southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and "structure-like" analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterized by two ancestral components-one represented in the pattern of Y chromosomal and EDAR results and the other by mitochondrial DNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from southeast Asia, followed by extensive sex-specific admixture with local Indian populations.


Assuntos
Emigração e Imigração , Variação Genética , Genética Populacional , Idioma , Sudeste Asiático , Cromossomos Humanos Y , DNA Mitocondrial/genética , Humanos , Índia
20.
Mol Microbiol ; 80(1): 54-67, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21320180

RESUMO

Inhibitors of protein synthesis cause defects in the assembly of ribosomal subunits. In response to treatment with the antibiotics erythromycin or chloramphenicol, precursors of both large and small ribosomal subunits accumulate. We have used a pulse-labelling approach to demonstrate that the accumulating subribosomal particles maturate into functional 70S ribosomes. The protein content of the precursor particles is heterogeneous and does not correspond with known assembly intermediates. Mass spectrometry indicates that production of ribosomal proteins in the presence of the antibiotics correlates with the amounts of the individual ribosomal proteins within the precursor particles. Thus, treatment of cells with chloramphenicol or erythromycin leads to an unbalanced synthesis of ribosomal proteins, providing the explanation for formation of assembly-defective particles. The operons for ribosomal proteins show a characteristic pattern of antibiotic inhibition where synthesis of the first proteins is inhibited weakly but gradually increases for the subsequent proteins in the operon. This phenomenon most likely reflects translational coupling and allows us to identify other putative coupled non-ribosomal operons in the Escherichia coli chromosome.


Assuntos
Antibacterianos/farmacologia , Proteínas Ribossômicas/metabolismo , Ribossomos/efeitos dos fármacos , Ribossomos/metabolismo , Cloranfenicol/farmacologia , Eritromicina/farmacologia , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas Ribossômicas/genética , Subunidades Ribossômicas/efeitos dos fármacos , Subunidades Ribossômicas/metabolismo , Ribossomos/genética , Espectrometria de Massas em Tandem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA