Pesquisa | Portal Regional da BVS

Extracellular vesicle-associated repetitive element DNAs as candidate osteosarcoma biomarkers.

Cambier, Linda; Stachelek, Kevin; Triska, Martin; Jubran, Rima; Huang, Manyu; Li, Wuyin; Zhang, Jianying; Li, Jitian; Cobrinik, David.

Sci Rep ; 11(1): 94, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33420117

RESUMO

Osteosarcoma (OS) is the most common malignant bone tumor in children and young adults. Despite that high-risk factors have been identified, no test for early detection is available. This study aimed to identify circulating nucleic acid sequences associated with serum extracellular vesicle (EV) preparations at the time of OS diagnosis, as a step towards an OS early detection assay. Sequencing of small nucleic acids extracted from serum EV preparations revealed increased representation of diverse repetitive element sequences in OS patient versus control sera. Analysis of a validation cohort using qPCR of PEG-precipitated EV preparations revealed the over-representation of HSATI, HSATII, LINE1-P1, and Charlie 3 at the DNA but not RNA level, with receiver operating characteristic (ROC) area under the curve (AUC) ≥ 0.90. HSATI and HSATII DNAs co-purified with EVs prepared by precipitation and size exclusion chromatography but not by exosome immunocapture, indicative of packaging in a non-exosomal complex. The consistent over-representation of EV-associated repetitive element DNA sequences suggests their potential utility as biomarkers for OS and perhaps other cancers.

Assuntos

Biomarcadores/metabolismo , Neoplasias Ósseas/metabolismo , DNA/metabolismo , Vesículas Extracelulares/metabolismo , Osteossarcoma/metabolismo , Adolescente , Adulto , Neoplasias Ósseas/diagnóstico , Neoplasias Ósseas/genética , Criança , Pré-Escolar , DNA/genética , Vesículas Extracelulares/genética , Feminino , Humanos , Masculino , Osteossarcoma/diagnóstico , Osteossarcoma/genética , Sequências Repetitivas de Ácido Nucleico , Adulto Jovem

Nucleotide patterns aiding in prediction of eukaryotic promoters.

Triska, Martin; Solovyev, Victor; Baranova, Ancha; Kel, Alexander; Tatarinova, Tatiana V.

PLoS One ; 12(11): e0187243, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29141011

RESUMO

Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into "promoters" and "non-promoters" even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 "promoter-specific" transcription factors), those that bind preferentially to the [0,500] region (282 "5' UTR-specific" TFs), and 207 of the "promiscuous" transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.

Assuntos

Eucariotos/genética , Nucleotídeos/metabolismo , Regiões Promotoras Genéticas , Algoritmos , Sítios de Ligação , Metilação de DNA , Evolução Molecular , Oryza/genética , Fatores de Transcrição/metabolismo

Evidence-based gene models for structural and functional annotations of the oil palm genome.

Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie.

Biol Direct ; 12(1): 21, 2017 09 08.

Artigo em Inglês | MEDLINE | ID: mdl-28886750

RESUMO

BACKGROUND: Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. RESULTS: Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. CONCLUSIONS: We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database ( http://palmxplore.mpob.gov.my ), will provide important resources for studies on the genomes of oil palm and related crops. REVIEWERS: This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.

Assuntos

Arecaceae/genética , Genoma de Planta , Modelos Genéticos , Anotação de Sequência Molecular , Biologia Computacional/métodos , Genes de Plantas , Software

Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer.

Triska, Martin; Ivliev, Alexander; Nikolsky, Yuri; Tatarinova, Tatiana V.

Methods Mol Biol ; 1613: 291-310, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28849565

RESUMO

Analysis of gene co-expression networks is a powerful "data-driven" tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of "data-driven" co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson's correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes , Neoplasias/genética , Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Elementos de Resposta

Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe.

Triska, Petr; Chekanov, Nikolay; Stepanov, Vadim; Khusnutdinova, Elza K; Kumar, Ganesh Prasad Arun; Akhmetova, Vita; Babalyan, Konstantin; Boulygina, Eugenia; Kharkov, Vladimir; Gubina, Marina; Khidiyatova, Irina; Khitrinskaya, Irina; Khrameeva, Ekaterina E; Khusainova, Rita; Konovalova, Natalia; Litvinov, Sergey; Marusin, Andrey; Mazur, Alexandr M; Puzyrev, Valery; Ivanoshchuk, Dinara; Spiridonova, Maria; Teslyuk, Anton; Tsygankova, Svetlana; Triska, Martin; Trofimova, Natalya; Vajda, Edward; Balanovsky, Oleg; Baranova, Ancha; Skryabin, Konstantin; Tatarinova, Tatiana V; Prokhortchouk, Egor.

BMC Genet ; 18(Suppl 1): 110, 2017 12 28.

Artigo em Inglês | MEDLINE | ID: mdl-29297395

RESUMO

BACKGROUND: The history of human populations occupying the plains and mountain ridges separating Europe from Asia has been eventful, as these natural obstacles were crossed westward by multiple waves of Turkic and Uralic-speaking migrants as well as eastward by Europeans. Unfortunately, the material records of history of this region are not dense enough to reconstruct details of population history. These considerations stimulate growing interest to obtain a genetic picture of the demographic history of migrations and admixture in Northern Eurasia. RESULTS: We genotyped and analyzed 1076 individuals from 30 populations with geographical coverage spanning from Baltic Sea to Baikal Lake. Our dense sampling allowed us to describe in detail the population structure, provide insight into genomic history of numerous European and Asian populations, and significantly increase quantity of genetic data available for modern populations in region of North Eurasia. Our study doubles the amount of genome-wide profiles available for this region. We detected unusually high amount of shared identical-by-descent (IBD) genomic segments between several Siberian populations, such as Khanty and Ket, providing evidence of genetic relatedness across vast geographic distances and between speakers of different language families. Additionally, we observed excessive IBD sharing between Khanty and Bashkir, a group of Turkic speakers from Southern Urals region. While adding some weight to the "Finno-Ugric" origin of Bashkir, our studies highlighted that the Bashkir genepool lacks the main "core", being a multi-layered amalgamation of Turkic, Ugric, Finnish and Indo-European contributions, which points at intricacy of genetic interface between Turkic and Uralic populations. Comparison of the genetic structure of Siberian ethnicities and the geography of the region they inhabit point at existence of the "Great Siberian Vortex" directing genetic exchanges in populations across the Siberian part of Asia. Slavic speakers of Eastern Europe are, in general, very similar in their genetic composition. Ukrainians, Belarusians and Russians have almost identical proportions of Caucasus and Northern European components and have virtually no Asian influence. We capitalized on wide geographic span of our sampling to address intriguing question about the place of origin of Russian Starovers, an enigmatic Eastern Orthodox Old Believers religious group relocated to Siberia in seventeenth century. A comparative reAdmix analysis, complemented by IBD sharing, placed their roots in the region of the Northern European Plain, occupied by North Russians and Finno-Ugric Komi and Karelian people. Russians from Novosibirsk and Russian Starover exhibit ancestral proportions close to that of European Eastern Slavs, however, they also include between five to 10 % of Central Siberian ancestry, not present at this level in their European counterparts. CONCLUSIONS: Our project has patched the hole in the genetic map of Eurasia: we demonstrated complexity of genetic structure of Northern Eurasians, existence of East-West and North-South genetic gradients, and assessed different inputs of ancient populations into modern populations.

Assuntos

Emigração e Imigração/história , Etnicidade/genética , Genética Populacional , Algoritmos , Ásia , DNA , Conjuntos de Dados como Assunto , Europa (Continente) , Feminino , Variação Genética , Técnicas de Genotipagem , História do Século XV , História do Século XVI , História do Século XVII , História do Século XVIII , História do Século XIX , História do Século XX , História do Século XXI , História Antiga , História Medieval , Humanos , Masculino , Federação Russa

Genomic study of the Ket: a Paleo-Eskimo-related ethnic group with significant ancient North Eurasian ancestry.

Flegontov, Pavel; Changmai, Piya; Zidkova, Anastassiya; Logacheva, Maria D; Altinisik, N Ezgi; Flegontova, Olga; Gelfand, Mikhail S; Gerasimov, Evgeny S; Khrameeva, Ekaterina E; Konovalova, Olga P; Neretina, Tatiana; Nikolsky, Yuri V; Starostin, George; Stepanova, Vita V; Travinsky, Igor V; Tríska, Martin; Tríska, Petr; Tatarinova, Tatiana V.

Sci Rep ; 6: 20768, 2016 Feb 11.

Artigo em Inglês | MEDLINE | ID: mdl-26865217

RESUMO

The Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We investigated connections between the Kets and Siberian and North American populations, with emphasis on the Mal'ta and Paleo-Eskimo ancient genomes, using original data from 46 unrelated samples of Kets and 42 samples of their neighboring ethnic groups (Uralic-speaking Nganasans, Enets, and Selkups). We genotyped over 130,000 autosomal SNPs, identified mitochondrial and Y-chromosomal haplogroups, and performed high-coverage genome sequencing of two Ket individuals. We established that Nganasans, Kets, Selkups, and Yukaghirs form a cluster of populations most closely related to Paleo-Eskimos in Siberia (not considering indigenous populations of Chukotka and Kamchatka). Kets are closely related to modern Selkups and to some Bronze and Iron Age populations of the Altai region, with all these groups sharing a high degree of Mal'ta ancestry. Implications of these findings for the linguistic hypothesis uniting Ket and Na-Dene languages into a language macrofamily are discussed.

Assuntos

DNA Mitocondrial/genética , Etnicidade/genética , Genoma Humano , Inuíte/genética , Filogenia , Polimorfismo de Nucleotídeo Único , Cromossomos Humanos Y , Variação Genética , Haplótipos , Migração Humana , Humanos , Idioma , Filogeografia , Sibéria

Differential Evolution approach to detect recent admixture.

Kozlov, Konstantin; Chebotarev, Dmitri; Hassan, Mehedi; Triska, Martin; Triska, Petr; Flegontov, Pavel; Tatarinova, Tatiana V.

BMC Genomics ; 16 Suppl 8: S9, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26111206

RESUMO

The genetic structure of human populations is extraordinarily complex and of fundamental importance to studies of anthropology, evolution, and medicine. As increasingly many individuals are of mixed origin, there is an unmet need for tools that can infer multiple origins. Misclassification of such individuals can lead to incorrect and costly misinterpretations of genomic data, primarily in disease studies and drug trials. We present an advanced tool to infer ancestry that can identify the biogeographic origins of highly mixed individuals. reAdmix can incorporate individual's knowledge of ancestors (e.g. having some ancestors from Turkey or a Scottish grandmother). reAdmix is an online tool available at http://chcb.saban-chla.usc.edu/reAdmix/.

Assuntos

Evolução Biológica , Biologia Computacional , Etnicidade/genética , Genética Médica/métodos , Animais , Humanos , Software

cisExpress: motif detection in DNA sequences.

Triska, Martin; Grocutt, David; Southern, James; Murphy, Denis J; Tatarinova, Tatiana.

Bioinformatics ; 29(17): 2203-5, 2013 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-23793750

RESUMO

MOTIVATION: One of the major challenges for contemporary bioinformatics is the analysis and accurate annotation of genomic datasets to enable extraction of useful information about the functional role of DNA sequences. This article describes a novel genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. This new tool, cisExpress, is especially designed for use with large datasets, such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node. We demonstrate the robust nature and validity of the proposed method. It is applicable for use with a wide range of genomic databases for any species of interest. AVAILABILITY: cisExpress is available at www.cisexpress.org.

Assuntos

DNA/química , Regiões Promotoras Genéticas , Análise de Sequência de DNA/métodos , Software , Algoritmos , Arabidopsis/genética , Genômica , Motivos de Nucleotídeos

NPEST: a nonparametric method and a database for transcription start site prediction.

Tatarinova, Tatiana; Kryshchenko, Alona; Triska, Martin; Hassan, Mehedi; Murphy, Denis; Neely, Michael; Schumitzky, Alan.

Quant Biol ; 1(4): 261-271, 2013 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-25197613

RESUMO

In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glacombio.net/NPEST.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA