Pesquisa | Portal de Pesquisa da BVS

1.

Enrichment of ovine gonadotropes via adenovirus gene targeting enhances assessment of transcriptional changes in response to estradiol-17 beta.

Murtazina, Dilyara A; Arreguin-Arevalo, Jesus Alejandro; Cantlon, Jeremy D; Ebrahimpour-Boroojeny, Ali; Shrestha, Akash; Hicks, Jennifer A; Magee, Christianne; Kirkley, Kelly; Jones, Kenneth; Nett, Terry M; Chitsaz, Hamidreza; Clay, Colin M.

Biol Reprod ; 102(1): 156-169, 2020 02 12.

Artigo em Inglês | MEDLINE | ID: mdl-31504222

RESUMO

Gonadotropes represent approximately 5-15% of the total endocrine cell population in the mammalian anterior pituitary. Therefore, assessing the effects of experimental manipulation on virtually any parameter of gonadotrope biology is difficult to detect and parse from background noise. In non-rodent species, applying techniques such as high-throughput ribonucleic acid (RNA) sequencing is problematic due to difficulty in isolating and analyzing individual endocrine cell populations. Herein, we exploited cell-specific properties inherent to the proximal promoter of the human glycoprotein hormone alpha subunit gene (CGA) to genetically target the expression of a fluorescent reporter (green fluorescent protein [GFP]) selectively to ovine gonadotropes. Dissociated ovine pituitary cells were cultured and infected with an adenoviral reporter vector (Ad-hαCGA-eGFP). We established efficient gene targeting by successfully enriching dispersed GFP-positive cells with flow cytometry. Confirming enrichment of gonadotropes specifically, we detected elevated levels of luteinizing hormone (LH) but not thyrotropin-stimulating hormone (TSH) in GFP-positive cell populations compared to GFP-negative populations. Subsequently, we used next-generation sequencing to obtain the transcriptional profile of GFP-positive ovine gonadotropes in the presence or absence of estradiol 17-beta (E2), a key modulator of gonadotrope function. Compared to non-sorted cells, enriched GFP-positive cells revealed a distinct transcriptional profile consistent with established patterns of gonadotrope gene expression. Importantly, we also detected nearly 200 E2-responsive genes in enriched gonadotropes, which were not apparent in parallel experiments on non-enriched cell populations. From these data, we conclude that CGA-targeted adenoviral gene transfer is an effective means for selectively labeling and enriching ovine gonadotropes suitable for investigation by numerous experimental approaches.

Assuntos

Estradiol/farmacologia , Gonadotrofos/efeitos dos fármacos , Adeno-Hipófise/efeitos dos fármacos , Adenoviridae , Animais , Gonadotrofos/metabolismo , Hormônio Luteinizante/metabolismo , Adeno-Hipófise/metabolismo , Ovinos , Tireotropina/metabolismo

2.

DNA methylation regulates discrimination of enhancers from promoters through a H3K4me1-H3K4me3 seesaw mechanism.

Sharifi-Zarchi, Ali; Gerovska, Daniela; Adachi, Kenjiro; Totonchi, Mehdi; Pezeshk, Hamid; Taft, Ryan J; Schöler, Hans R; Chitsaz, Hamidreza; Sadeghi, Mehdi; Baharvand, Hossein; Araúzo-Bravo, Marcos J.

BMC Genomics ; 18(1): 964, 2017 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-29233090

RESUMO

BACKGROUND: DNA methylation at promoters is largely correlated with inhibition of gene expression. However, the role of DNA methylation at enhancers is not fully understood, although a crosstalk with chromatin marks is expected. Actually, there exist contradictory reports about positive and negative correlations between DNA methylation and H3K4me1, a chromatin hallmark of enhancers. RESULTS: We investigated the relationship between DNA methylation and active chromatin marks through genome-wide correlations, and found anti-correlation between H3K4me1 and H3K4me3 enrichment at low and intermediate DNA methylation loci. We hypothesized "seesaw" dynamics between H3K4me1 and H3K4me3 in the low and intermediate DNA methylation range, in which DNA methylation discriminates between enhancers and promoters, marked by H3K4me1 and H3K4me3, respectively. Low methylated regions are H3K4me3 enriched, while those with intermediate DNA methylation levels are progressively H3K4me1 enriched. Additionally, the enrichment of H3K27ac, distinguishing active from primed enhancers, follows a plateau in the lower range of the intermediate DNA methylation level, corresponding to active enhancers, and decreases linearly in the higher range of the intermediate DNA methylation. Thus, the decrease of the DNA methylation switches smoothly the state of the enhancers from a primed to an active state. We summarize these observations into a rule of thumb of one-out-of-three methylation marks: "In each genomic region only one out of these three methylation marks {DNA methylation, H3K4me1, H3K4me3} is high. If it is the DNA methylation, the region is inactive. If it is H3K4me1, the region is an enhancer, and if it is H3K4me3, the region is a promoter". To test our model, we used available genome-wide datasets of H3K4 methyltransferases knockouts. Our analysis suggests that CXXC proteins, as readers of non-methylated CpGs would regulate the "seesaw" mechanism that focuses H3K4me3 to unmethylated sites, while being repulsed from H3K4me1 decorated enhancers and CpG island shores. CONCLUSIONS: Our results show that DNA methylation discriminates promoters from enhancers through H3K4me1-H3K4me3 seesaw mechanism, and suggest its possible function in the inheritance of chromatin marks after cell division. Our analyses suggest aberrant formation of promoter-like regions and ectopic transcription of hypomethylated regions of DNA. Such mechanism process can have important implications in biological process in where it has been reported abnormal DNA methylation status such as cancer and aging.

Assuntos

Metilação de DNA , Elementos Facilitadores Genéticos , Código das Histonas , Regiões Promotoras Genéticas , Animais , Citosina/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Expressão Gênica , Histonas/metabolismo , Camundongos , Domínios Proteicos

3.

Candidate phylum TM6 genome recovered from a hospital sink biofilm provides genomic insights into this uncultivated phylum.

McLean, Jeffrey S; Lombardo, Mary-Jane; Badger, Jonathan H; Edlund, Anna; Novotny, Mark; Yee-Greenbaum, Joyclyn; Vyahhi, Nikolay; Hall, Adam P; Yang, Youngik; Dupont, Christopher L; Ziegler, Michael G; Chitsaz, Hamidreza; Allen, Andrew E; Yooseph, Shibu; Tesler, Glenn; Pevzner, Pavel A; Friedman, Robert M; Nealson, Kenneth H; Venter, J Craig; Lasken, Roger S.

Proc Natl Acad Sci U S A ; 110(26): E2390-9, 2013 Jun 25.

Artigo em Inglês | MEDLINE | ID: mdl-23754396

RESUMO

The "dark matter of life" describes microbes and even entire divisions of bacterial phyla that have evaded cultivation and have yet to be sequenced. We present a genome from the globally distributed but elusive candidate phylum TM6 and uncover its metabolic potential. TM6 was detected in a biofilm from a sink drain within a hospital restroom by analyzing cells using a highly automated single-cell genomics platform. We developed an approach for increasing throughput and effectively improving the likelihood of sampling rare events based on forming small random pools of single-flow-sorted cells, amplifying their DNA by multiple displacement amplification and sequencing all cells in the pool, creating a "mini-metagenome." A recently developed single-cell assembler, SPAdes, in combination with contig binning methods, allowed the reconstruction of genomes from these mini-metagenomes. A total of 1.07 Mb was recovered in seven contigs for this member of TM6 (JCVI TM6SC1), estimated to represent 90% of its genome. High nucleotide identity between a total of three TM6 genome drafts generated from pools that were independently captured, amplified, and assembled provided strong confirmation of a correct genomic sequence. TM6 is likely a Gram-negative organism and possibly a symbiont of an unknown host (nonfree living) in part based on its small genome, low-GC content, and lack of biosynthesis pathways for most amino acids and vitamins. Phylogenomic analysis of conserved single-copy genes confirms that TM6SC1 is a deeply branching phylum.

Assuntos

Biofilmes , Hospitais , Metagenoma , Engenharia Sanitária , Microbiologia da Água , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , DNA Bacteriano/genética , DNA Bacteriano/isolamento & purificação , DNA Bacteriano/metabolismo , Evolução Molecular , Genoma Bacteriano , Humanos , Redes e Vias Metabólicas , Metagenômica/métodos , Dados de Sequência Molecular , Filogenia , Abastecimento de Água

4.

ARYANA: Aligning Reads by Yet Another Approach.

Gholami, Milad; Arbabi, Aryan; Sharifi-Zarchi, Ali; Chitsaz, Hamidreza; Sadeghi, Mehdi.

BMC Bioinformatics ; 15 Suppl 9: S12, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25252881

RESUMO

MOTIVATION: Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $10(6) prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. CONTRIBUTION: We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. AVAILABILITY: ARYANA with complete source code can be obtained from http://github.com/aryana-aligner.

Assuntos

Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Alinhamento de Sequência/economia , Análise de Sequência de DNA/economia

5.

HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly.

Shariat, Basir; Movahedi, Narjes Sadat; Chitsaz, Hamidreza; Boucher, Christina.

BMC Genomics ; 15 Suppl 10: S9, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25558875

RESUMO

MOTIVATION: Intimately tied to assembly quality is the complexity of the de Bruijn graph built by the assembler. Thus, there have been many paradigms developed to decrease the complexity of the de Bruijn graph. One obvious combinatorial paradigm for this is to allow the value of k to vary; having a larger value of k where the graph is more complex and a smaller value of k where the graph would likely contain fewer spurious edges and vertices. One open problem that affects the practicality of this method is how to predict the value of k prior to building the de Bruijn graph. We show that optimal values of k can be predicted prior to assembly by using the information contained in a phylogenetically-close genome and therefore, help make the use of multiple values of k practical for genome assembly. RESULTS: We present HyDA-Vista, which is a genome assembler that uses homology information to choose a value of k for each read prior to the de Bruijn graph construction. The chosen k is optimal if there are no sequencing errors and the coverage is sufficient. Fundamental to our method is the construction of the maximal sequence landscape, which is a data structure that stores for each position in the input string, the largest repeated substring containing that position. In particular, we show the maximal sequence landscape can be constructed in O(n+n log n)-time and O(n)-space. HyDA-Vista first constructs the maximal sequence landscape for a homologous genome. The reads are then aligned to this reference genome, and values of k are assigned to each read using the maximal sequence landscape and the alignments. Eventually, all the reads are assembled by an iterative de Bruijn graph construction method. Our results and comparison to other assemblers demonstrate that HyDA-Vista achieves the best assembly of E. coli before repeat resolution or scaffolding. AVAILABILITY: HyDA-Vista is freely available 1. The code for constructing the maximal sequence landscape and choosing the optimal value of k for each read is also separately available on the website and could be incorporated into any genome assembler.

Assuntos

Algoritmos , Análise de Sequência de DNA/métodos , Simulação por Computador , Escherichia coli/genética , Genoma , Humanos , Homologia de Sequência do Ácido Nucleico

6.

The RNA Newton polytope and learnability of energy parameters.

Forouzmand, Elmirasadat; Chitsaz, Hamidreza.

Bioinformatics ; 29(13): i300-7, 2013 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-23812998

RESUMO

MOTIVATION: Computational RNA structure prediction is a mature important problem that has received a new wave of attention with the discovery of regulatory non-coding RNAs and the advent of high-throughput transcriptome sequencing. Despite nearly two score years of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. So far, researchers have proposed increasingly complex energy models and improved parameter estimation methods, experimental and/or computational, in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. Why is that? APPROACH: The first step toward high-accuracy structure prediction is to pick an energy model that is inherently capable of predicting each and every one of known structures to date. In this article, we introduce the notion of learnability of the parameters of an energy model as a measure of such an inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. To the best of our knowledge, this is the first approach toward computing the RNA Newton polytope and a systematic assessment of the inherent capabilities of an energy model. The worst case complexity of our algorithm is exponential in the number of features. However, dimensionality reduction techniques can provide approximate solutions to avoid the curse of dimensionality. RESULTS: We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U, C-G and G-U base pairs. Our results show that this simple energy model satisfies the necessary condition for more than half of the input unpseudoknotted sequence-structure pairs (55%) chosen from the RNA STRAND v2.0 database and severely violates the condition for ~ 13%, which provide a set of hard cases that require further investigation. From 1350 RNA strands, the observed 3D feature vector for 749 strands is on the surface of the computed polytope. For 289 RNA strands, the observed feature vector is not on the boundary of the polytope but its distance from the boundary is not more than one. A distance of one essentially means one base pair difference between the observed structure and the closest point on the boundary of the polytope, which need not be the feature vector of a structure. For 171 sequences, this distance is larger than two, and for only 11 sequences, this distance is larger than five. AVAILABILITY: The source code is available on http://compbio.cs.wayne.edu/software/rna-newton-polytope.

Assuntos

Algoritmos , RNA/química , Inteligência Artificial , Pareamento de Bases , Modelos Moleculares , Conformação de Ácido Nucleico

7.

Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.

Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza.

Bioinformatics ; 29(19): 2395-401, 2013 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-23918251

RESUMO

MOTIVATION: Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. RESULTS: Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. AVAILABILITY: Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.

Assuntos

Genoma Microbiano , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Humanos , Intestinos/microbiologia , Homologia de Sequência do Ácido Nucleico

8.

SEQuel: improving the accuracy of genome assemblies.

Ronen, Roy; Boucher, Christina; Chitsaz, Hamidreza; Pevzner, Pavel.

Bioinformatics ; 28(12): i188-96, 2012 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-22689760

RESUMO

MOTIVATION: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. RESULTS: SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly. AVAILABILITY: SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/.

Assuntos

Biologia Computacional/métodos , Genoma Bacteriano , Análise de Sequência de DNA/métodos , Algoritmos , Mapeamento de Sequências Contíguas , Escherichia coli/genética , Mutação INDEL

9.

Pan-cancer analysis of microRNA expression profiles highlights microRNAs enriched in normal body cells as effective suppressors of multiple tumor types: A study based on TCGA database.

Moradi, Sharif; Kamal, Aryan; Aboulkheyr Es, Hamidreza; Farhadi, Farnoosh; Ebrahimi, Marzieh; Chitsaz, Hamidreza; Sharifi-Zarchi, Ali; Baharvand, Hossein.

PLoS One ; 17(4): e0267291, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35476804

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are frequently deregulated in various types of cancer. While antisense oligonucleotides are used to block oncomiRs, delivery of tumour-suppressive miRNAs holds great potential as a potent anti-cancer strategy. Here, we aim to determine, and functionally analyse, miRNAs that are lowly expressed in various types of tumour but abundantly expressed in multiple normal tissues. METHODS: The miRNA sequencing data of 14 cancer types were downloaded from the TCGA dataset. Significant differences in miRNA expression between tumor and normal samples were calculated using limma package (R programming). An adjusted p value < 0.05 was used to compare normal versus tumor miRNA expression profiles. The predicted gene targets were obtained using TargetScan, miRanda, and miRDB and then subjected to gene ontology analysis using Enrichr. Only GO terms with an adjusted p < 0.05 were considered statistically significant. All data from wet-lab experiments (cell viability assays and flow cytometry) were expressed as means ± SEM, and their differences were analyzed using GraphPad Prism software (Student's t test, p < 0.05). RESULTS: By compiling all publicly available miRNA profiling data from The Cancer Genome Atlas (TCGA) Pan-Cancer Project, we reveal a small set of tumour-suppressing miRNAs (which we designate as 'normomiRs') that are highly expressed in 14 types of normal tissues but poorly expressed in corresponding tumour tissues. Interestingly, muscle-enriched miRNAs (e.g. miR-133a/b and miR-206) and miRNAs from DLK1-DIO3 locus (e.g. miR-381 and miR-411) constitute a large fraction of the normomiRs. Moreover, we define that the CCCGU motif is absent in the oncomiRs' seed sequences but present in a fraction of tumour-suppressive miRNAs. Finally, the gain of function of candidate normomiRs across several cancer cell types indicates that miR-206 and miR-381 exert the most potent inhibition on multiple cancer types in vitro. CONCLUSION: Our results reveal a pan-cancer set of tumour-suppressing miRNAs and highlight the potential of miRNA-replacement therapies for targeting multiple types of tumour.

Assuntos

MicroRNAs , Neoplasias , Bases de Dados Factuais , Regulação Neoplásica da Expressão Gênica , Ontologia Genética , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Neoplasias/genética

10.

A partition function algorithm for interacting nucleic acid strands.

Chitsaz, Hamidreza; Salari, Raheleh; Sahinalp, S Cenk; Backofen, Rolf.

Bioinformatics ; 25(12): i365-73, 2009 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-19478011

RESUMO

UNLABELLED: Recent interests, such as RNA interference and antisense RNA regulation, strongly motivate the problem of predicting whether two nucleic acid strands interact. MOTIVATION: Regulatory non-coding RNAs (ncRNAs) such as microRNAs play an important role in gene regulation. Studies on both prokaryotic and eukaryotic cells show that such ncRNAs usually bind to their target mRNA to regulate the translation of corresponding genes. The specificity of these interactions depends on the stability of intermolecular and intramolecular base pairing. While methods like deep sequencing allow to discover an ever increasing set of ncRNAs, there are no high-throughput methods available to detect their associated targets. Hence, there is an increasing need for precise computational target prediction. In order to predict base-pairing probability of any two bases in interacting nucleic acids, it is necessary to compute the interaction partition function over the whole ensemble. The partition function is a scalar value from which various thermodynamic quantities can be derived. For example, the equilibrium concentration of each complex nucleic acid species and also the melting temperature of interacting nucleic acids can be calculated based on the partition function of the complex. RESULTS: We present a model for analyzing the thermodynamics of two interacting nucleic acid strands considering the most general type of interactions studied in the literature. We also present a corresponding dynamic programming algorithm that computes the partition function over (almost) all physically possible joint secondary structures formed by two interacting nucleic acids in O(n(6)) time. We verify the predictive power of our algorithm by computing (i) the melting temperature for interacting RNA pairs studied in the literature and (ii) the equilibrium concentration for several variants of the OxyS-fhlA complex. In both experiments, our algorithm shows high accuracy and outperforms competitors. AVAILABILITY: Software and web server is available at http://compbio.cs.sfu.ca/taverna/pirna/. SUPPLEMENTARY INFORMATION: Supplementary data are avaliable at Bioinformatics online.

Assuntos

Algoritmos , RNA/química , Biologia Computacional/métodos , Bases de Dados Genéticas , Conformação de Ácido Nucleico , RNA Antissenso , RNA não Traduzido/química , Análise de Sequência de RNA

11.

PyGTED: Python Application for Computing Graph Traversal Edit Distance.

Ebrahimpour Boroojeny, Ali; Shrestha, Akash; Sharifi-Zarchi, Ali; Gallagher, Suzanne Renick; Sahinalp, Süleyman Cenk; Chitsaz, Hamidreza.

J Comput Biol ; 27(3): 436-439, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-32160033

RESUMO

Graph Traversal Edit Distance (GTED) is a measure of distance (or dissimilarity) between two graphs introduced. This measure is based on the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. GTED was motivated by and provides the first mathematical formalism for sequence coassembly and de novo variation detection in bioinformatics. Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we introduce a tool, PyGTED to compute GTED. It implements the algorithm based on the polynomial time algorithm devised for it by the authors. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs.

Assuntos

Biologia Computacional/métodos , Mineração de Dados , Aprendizado de Máquina , Programação Linear , Software

12.

Graph Traversal Edit Distance and Extensions.

Ebrahimpour Boroojeny, Ali; Shrestha, Akash; Sharifi-Zarchi, Ali; Gallagher, Suzanne Renick; Sahinalp, S Cenk; Chitsaz, Hamidreza.

J Comput Biol ; 27(3): 317-329, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-32058803

RESUMO

Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we give a new graph kernel, which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of a support vector machine (SVM) classifier on a few data sets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested data sets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads.

Assuntos

Biologia Computacional/métodos , Programação Linear , Algoritmos , Animais , Mineração de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Máquina de Vetores de Suporte

13.

DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome.

Azarkhalili, Behrooz; Saberi, Ali; Chitsaz, Hamidreza; Sharifi-Zarchi, Ali.

Sci Rep ; 9(1): 16526, 2019 11 11.

Artigo em Inglês | MEDLINE | ID: mdl-31712594

RESUMO

Despite great advances, molecular cancer pathology is often limited to the use of a small number of biomarkers rather than the whole transcriptome, partly due to computational challenges. Here, we introduce a novel architecture of Deep Neural Networks (DNNs) that is capable of simultaneous inference of various properties of biological samples, through multi-task and transfer learning. It encodes the whole transcription profile into a strikingly low-dimensional latent vector of size 8, and then recovers mRNA and miRNA expression profiles, tissue and disease type from this vector. This latent space is significantly better than the original gene expression profiles for discriminating samples based on their tissue and disease. We employed this architecture on mRNA transcription profiles of 10750 clinical samples from 34 classes (one healthy and 33 different types of cancer) from 27 tissues. Our method significantly outperforms prior works and classical machine learning approaches in predicting tissue-of-origin, normal or disease state and cancer type of each sample. For tissues with more than one type of cancer, it reaches 99.4% accuracy in identifying the correct cancer subtype. We also show this system is very robust against noise and missing values. Collectively, our results highlight applications of artificial intelligence in molecular cancer pathology and oncological research. DeePathology is freely available at https://github.com/SharifBioinf/DeePathology .

Assuntos

Biologia Computacional , Aprendizado Profundo , Perfilação da Expressão Gênica , Neoplasias/genética , Neoplasias/patologia , Transcriptoma , Algoritmos , Biologia Computacional/métodos , Mineração de Dados , Suscetibilidade a Doenças , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Humanos , Redes Neurais de Computação , Especificidade de Órgãos/genética , Patologia Molecular/métodos , Reprodutibilidade dos Testes

14.

Cell Identity Codes: Understanding Cell Identity from Gene Expression Profiles using Deep Neural Networks.

Abdolhosseini, Farzad; Azarkhalili, Behrooz; Maazallahi, Abbas; Kamal, Aryan; Motahari, Seyed Abolfazl; Sharifi-Zarchi, Ali; Chitsaz, Hamidreza.

Sci Rep ; 9(1): 2342, 2019 02 20.

Artigo em Inglês | MEDLINE | ID: mdl-30787315

RESUMO

Understanding cell identity is an important task in many biomedical areas. Expression patterns of specific marker genes have been used to characterize some limited cell types, but exclusive markers are not available for many cell types. A second approach is to use machine learning to discriminate cell types based on the whole gene expression profiles (GEPs). The accuracies of simple classification algorithms such as linear discriminators or support vector machines are limited due to the complexity of biological systems. We used deep neural networks to analyze 1040 GEPs from 16 different human tissues and cell types. After comparing different architectures, we identified a specific structure of deep autoencoders that can encode a GEP into a vector of 30 numeric values, which we call the cell identity code (CIC). The original GEP can be reproduced from the CIC with an accuracy comparable to technical replicates of the same experiment. Although we use an unsupervised approach to train the autoencoder, we show different values of the CIC are connected to different biological aspects of the cell, such as different pathways or biological processes. This network can use CIC to reproduce the GEP of the cell types it has never seen during the training. It also can resist some noise in the measurement of the GEP. Furthermore, we introduce classifier autoencoder, an architecture that can accurately identify cell type based on the GEP or the CIC.

Assuntos

Células/metabolismo , Aprendizado Profundo , Perfilação da Expressão Gênica , Redes Neurais de Computação , Algoritmos , Compartimento Celular , Humanos , Especificidade de Órgãos/genética

15.

Draft genome of Dugesia japonica provides insights into conserved regulatory elements of the brain restriction gene nou-darake in planarians.

An, Yang; Kawaguchi, Akane; Zhao, Chen; Toyoda, Atsushi; Sharifi-Zarchi, Ali; Mousavi, Seyed Ahmad; Bagherzadeh, Reza; Inoue, Takeshi; Ogino, Hajime; Fujiyama, Asao; Chitsaz, Hamidreza; Baharvand, Hossein; Agata, Kiyokazu.

Zoological Lett ; 4: 24, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30181897

RESUMO

BACKGROUND: Planarians are non-parasitic Platyhelminthes (flatworms) famous for their regeneration ability and for having a well-organized brain. Dugesia japonica is a typical planarian species that is widely distributed in the East Asia. Extensive cellular and molecular experimental methods have been developed to identify the functions of thousands of genes in this species, making this planarian a good experimental model for regeneration biology and neurobiology. However, no genome-level information is available for D. japonica, and few gene regulatory networks have been identified thus far. RESULTS: To obtain whole-genome information on this species and to study its gene regulatory networks, we extracted genomic DNA from 200 planarians derived from a laboratory-bred asexual clonal strain, and sequenced 476 Gb of data by second-generation sequencing. Kmer frequency graphing and fosmid sequence analysis indicated a complex genome that would be difficult to assemble using second-generation sequencing short reads. To address this challenge, we developed a new assembly strategy and improved the de novo genome assembly, producing a 1.56 Gb genome sequence (DjGenome ver1.0, including 202,925 scaffolds and N50 length 27,741 bp) that covers 99.4% of all 19,543 genes in the assembled transcriptome, although the genome is fragmented as 80% of the genome consists of repeated sequences (genomic frequency ≥ 2). By genome comparison between two planarian genera, we identified conserved non-coding elements (CNEs), which are indicative of gene regulatory elements. Transgenic experiments using Xenopus laevis indicated that one of the CNEs in the Djndk gene may be a regulatory element, suggesting that the regulation of the ndk gene and the brain formation mechanism may be conserved between vertebrates and invertebrates. CONCLUSION: This draft genome and CNE analysis will contribute to resolving gene regulatory networks in planarians. The genome database is available at: http://www.planarian.jp.

16.

Genomewide Analysis of Clp1 Function in Transcription in Budding Yeast.

Al-Husini, Nadra; Sharifi, Ali; Mousavi, Seyed Ahmad; Chitsaz, Hamidreza; Ansari, Athar.

Sci Rep ; 7(1): 6894, 2017 07 31.

Artigo em Inglês | MEDLINE | ID: mdl-28761171

RESUMO

In budding yeast, the 3' end processing of mRNA and the coupled termination of transcription by RNAPII requires the CF IA complex. We have earlier demonstrated a role for the Clp1 subunit of this complex in termination and promoter-associated transcription of CHA1. To assess the generality of the observed function of Clp1 in transcription, we tested the effect of Clp1 on transcription on a genomewide scale using the Global Run-On-Seq (GRO-Seq) approach. GRO-Seq analysis showed the polymerase reading through the termination signal in the downstream region of highly transcribed genes in a temperature-sensitive mutant of Clp1 at elevated temperature. No such terminator readthrough was observed in the mutant at the permissive temperature. The poly(A)-independent termination of transcription of snoRNAs, however, remained unaffected in the absence of Clp1 activity. These results strongly suggest a role for Clp1 in poly(A)-coupled termination of transcription. Furthermore, the density of antisense transcribing polymerase upstream of the promoter region exhibited an increase in the absence of Clp1 activity, thus implicating Clp1 in promoter directionality. The overall conclusion of these results is that Clp1 plays a general role in poly(A)-coupled termination of RNAPII transcription and in enhancing promoter directionality in budding yeast.

Assuntos

RNA Mensageiro/metabolismo , Saccharomycetales/metabolismo , Fatores de Poliadenilação e Clivagem de mRNA/genética , Fatores de Poliadenilação e Clivagem de mRNA/metabolismo , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Mutação , Poliadenilação , Regiões Promotoras Genéticas , RNA Polimerase II/metabolismo , RNA Fúngico/genética , RNA Fúngico/metabolismo , RNA Mensageiro/genética , Saccharomycetales/genética , Análise de Sequência de RNA , Transcrição Gênica

17.

SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature.

Bokharaeian, Behrouz; Diaz, Alberto; Taghizadeh, Nasrin; Chitsaz, Hamidreza; Chavoshinejad, Ramyar.

J Biomed Semantics ; 8(1): 14, 2017 Apr 07.

Artigo em Inglês | MEDLINE | ID: mdl-28388928

RESUMO

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negation, modality markers, neutral candidates, and confidence level of associations. METHOD: In this research, different steps were presented so as to produce the SNPPhenA corpus. They include automatic Named Entity Recognition (NER) followed by the manual annotation of SNP and phenotype names, annotation of the SNP-phenotype associations and their level of confidence, as well as modality markers. Moreover, the produced corpus was annotated with negation scopes and cues as well as neutral candidates that play crucial role as far as negation and the modality phenomenon in relation to extraction tasks. RESULT: The agreement between annotators was measured by Cohen's Kappa coefficient where the resulting scores indicated the reliability of the corpus. The Kappa score was 0.79 for annotating the associations and 0.80 for the confidence degree of associations. Further presented were the basic statistics of the annotated features of the corpus in addition to the results of our first experiments related to the extraction of ranked SNP-Phenotype associations. The prepared guideline documents render the corpus more convenient and facile to use. The corpus, guidelines and inter-annotator agreement analysis are available on the website of the corpus: http://nil.fdi.ucm.es/?q=node/639 . CONCLUSION: Specifying the confidence degree of SNP-phenotype associations from articles helps identify the strength of associations that could in turn assist genomics scientists in determining phenotypic plasticity and the importance of environmental factors. What is more, our first experiments with the corpus show that linguistic-based confidence alongside other non-linguistic features can be utilized in order to estimate the strength of the observed SNP-phenotype associations. TRIAL REGISTRATION: Not Applicable.

Assuntos

Ontologia Genética , Armazenamento e Recuperação da Informação/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único , Mutação , Semântica

18.

Machine Learning and Network Analysis of Molecular Dynamics Trajectories Reveal Two Chains of Red/Ox-specific Residue Interactions in Human Protein Disulfide Isomerase.

Karamzadeh, Razieh; Karimi-Jafari, Mohammad Hossein; Sharifi-Zarchi, Ali; Chitsaz, Hamidreza; Salekdeh, Ghasem Hosseini; Moosavi-Movahedi, Ali Akbar.

Sci Rep ; 7(1): 3666, 2017 06 16.

Artigo em Inglês | MEDLINE | ID: mdl-28623339

RESUMO

The human protein disulfide isomerase (hPDI), is an essential four-domain multifunctional enzyme. As a result of disulfide shuffling in its terminal domains, hPDI exists in two oxidation states with different conformational preferences which are important for substrate binding and functional activities. Here, we address the redox-dependent conformational dynamics of hPDI through molecular dynamics (MD) simulations. Collective domain motions are identified by the principal component analysis of MD trajectories and redox-dependent opening-closing structure variations are highlighted on projected free energy landscapes. Then, important structural features that exhibit considerable differences in dynamics of redox states are extracted by statistical machine learning methods. Mapping the structural variations to time series of residue interaction networks also provides a holistic representation of the dynamical redox differences. With emphasizing on persistent long-lasting interactions, an approach is proposed that compiled these time series networks to a single dynamic residue interaction network (DRIN). Differential comparison of DRIN in oxidized and reduced states reveals chains of residue interactions that represent potential allosteric paths between catalytic and ligand binding sites of hPDI.

Assuntos

Aprendizado de Máquina , Simulação de Dinâmica Molecular , Isomerases de Dissulfetos de Proteínas/química , Domínios e Motivos de Interação entre Proteínas , Humanos , Oxirredução , Conformação Proteica , Isomerases de Dissulfetos de Proteínas/metabolismo , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas

19.

Enhancing Extraction of Drug-Drug Interaction from Literature Using Neutral Candidates, Negation, and Clause Dependency.

Bokharaeian, Behrouz; Diaz, Alberto; Chitsaz, Hamidreza.

PLoS One ; 11(10): e0163480, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27695078

RESUMO

MOTIVATION: Supervised biomedical relation extraction plays an important role in biomedical natural language processing, endeavoring to obtain the relations between biomedical entities. Drug-drug interactions, which are investigated in the present paper, are notably among the critical biomedical relations. Thus far many methods have been developed with the aim of extracting DDI relations. However, unfortunately there has been a scarcity of comprehensive studies on the effects of negation, complex sentences, clause dependency, and neutral candidates in the course of DDI extraction from biomedical articles. RESULTS: Our study proposes clause dependency features and a number of features for identifying neutral candidates as well as negation cues and scopes. Furthermore, our experiments indicate that the proposed features significantly improve the performance of the relation extraction task combined with other kernel methods. We characterize the contribution of each category of features and finally conclude that neutral candidate features have the most prominent role among all of the three categories.

Assuntos

Pesquisa Biomédica , Mineração de Dados , Interações Medicamentosas , Publicações , Algoritmos , Inteligência Artificial , Humanos , Processamento de Linguagem Natural

20.

Efficient Synergistic Single-Cell Genome Assembly.

Movahedi, Narjes S; Embree, Mallory; Nagarajan, Harish; Zengler, Karsten; Chitsaz, Hamidreza.

Front Bioeng Biotechnol ; 4: 42, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27243002

RESUMO

As the vast majority of all microbes are unculturable, single-cell sequencing has become a significant method to gain insight into microbial physiology. Single-cell sequencing methods, currently powered by multiple displacement genome amplification (MDA), have passed important milestones such as finishing and closing the genome of a prokaryote. However, the quality and reliability of genome assemblies from single cells are still unsatisfactory due to uneven coverage depth and the absence of scattered chunks of the genome in the final collection of reads caused by MDA bias. In this work, our new algorithm Hybrid De novo Assembler (HyDA) demonstrates the power of coassembly of multiple single-cell genomic data sets through significant improvement of the assembly quality in terms of predicted functional elements and length statistics. Coassemblies contain significantly more base pairs and protein coding genes, cover more subsystems, and consist of longer contigs compared to individual assemblies by the same algorithm as well as state-of-the-art single-cell assemblers SPAdes and IDBA-UD. Hybrid De novo Assembler (HyDA) is also able to avoid chimeric assemblies by detecting and separating shared and exclusive pieces of sequence for input data sets. By replacing one deep single-cell sequencing experiment with a few single-cell sequencing experiments of lower depth, the coassembly method can hedge against the risk of failure and loss of the sample, without significantly increasing sequencing cost. Application of the single-cell coassembler HyDA to the study of three uncultured members of an alkane-degrading methanogenic community validated the usefulness of the coassembly concept. HyDA is open source and publicly available at http://chitsazlab.org/software.html, and the raw reads are available at http://chitsazlab.org/research.html.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA