Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 20(5): e1012094, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38723024

RESUMO

Cell lineage tree reconstruction methods are developed for various tasks, such as investigating the development, differentiation, and cancer progression. Single-cell sequencing technologies enable more thorough analysis with higher resolution. We present Scuphr, a distance-based cell lineage tree reconstruction method using bulk and single-cell DNA sequencing data from healthy tissues. Common challenges of single-cell DNA sequencing, such as allelic dropouts and amplification errors, are included in Scuphr. Scuphr computes the distance between cell pairs and reconstructs the lineage tree using the neighbor-joining algorithm. With its embarrassingly parallel design, Scuphr can do faster analysis than the state-of-the-art methods while obtaining better accuracy. The method's robustness is investigated using various synthetic datasets and a biological dataset of 18 cells.


Assuntos
Algoritmos , Linhagem da Célula , Biologia Computacional , Análise de Célula Única , Análise de Célula Única/métodos , Linhagem da Célula/genética , Humanos , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Software , Modelos Estatísticos
2.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38676578

RESUMO

MOTIVATION: Copy number variations (CNVs) are common genetic alterations in tumour cells. The delineation of CNVs holds promise for enhancing our comprehension of cancer progression. Moreover, accurate inference of CNVs from single-cell sequencing data is essential for unravelling intratumoral heterogeneity. However, existing inference methods face limitations in resolution and sensitivity. RESULTS: To address these challenges, we present CopyVAE, a deep learning framework based on a variational autoencoder architecture. Through experiments, we demonstrated that CopyVAE can accurately and reliably detect CNVs from data obtained using single-cell RNA sequencing. CopyVAE surpasses existing methods in terms of sensitivity and specificity. We also discussed CopyVAE's potential to advance our understanding of genetic alterations and their impact on disease advancement. AVAILABILITY AND IMPLEMENTATION: CopyVAE is implemented and freely available under MIT license at https://github.com/kurtsemih/copyVAE.


Assuntos
Variações do Número de Cópias de DNA , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Aprendizado Profundo , Software , Transcriptoma/genética , Análise de Sequência de RNA/métodos , Neoplasias/genética
3.
Cell Syst ; 15(2): 149-165.e10, 2024 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-38340731

RESUMO

Cell types can be classified according to shared patterns of transcription. Non-genetic variability among individual cells of the same type has been ascribed to stochastic transcriptional bursting and transient cell states. Using high-coverage single-cell RNA profiling, we asked whether long-term, heritable differences in gene expression can impart diversity within cells of the same type. Studying clonal human lymphocytes and mouse brain cells, we uncovered a vast diversity of heritable gene expression patterns among different clones of cells of the same type in vivo. We combined chromatin accessibility and RNA profiling on different lymphocyte clones to reveal thousands of regulatory regions exhibiting interclonal variation, which could be directly linked to interclonal variation in gene expression. Our findings identify a source of cellular diversity, which may have important implications for how cellular populations are shaped by selective processes in development, aging, and disease. A record of this paper's transparent peer review process is included in the supplemental information.


Assuntos
Cromatina , RNA , Humanos , Camundongos , Animais , Envelhecimento , Expressão Gênica
4.
Science ; 382(6675): eadf8486, 2023 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-38060664

RESUMO

The spatial distribution of lymphocyte clones within tissues is critical to their development, selection, and expansion. We have developed spatial transcriptomics of variable, diversity, and joining (VDJ) sequences (Spatial VDJ), a method that maps B cell and T cell receptor sequences in human tissue sections. Spatial VDJ captures lymphocyte clones that match canonical B and T cell distributions and amplifies clonal sequences confirmed by orthogonal methods. We found spatial congruency between paired receptor chains, developed a computational framework to predict receptor pairs, and linked the expansion of distinct B cell clones to different tumor-associated gene expression programs. Spatial VDJ delineates B cell clonal diversity and lineage trajectories within their anatomical niche. Thus, Spatial VDJ captures lymphocyte spatial clonal architecture across tissues, providing a platform to harness clonal sequences for therapy.


Assuntos
Linfócitos B , Receptores de Células Precursoras de Linfócitos B , Receptores de Antígenos de Linfócitos T , Linfócitos T , Humanos , Linfócitos B/metabolismo , Células Clonais/metabolismo , Perfilação da Expressão Gênica/métodos , Receptores de Células Precursoras de Linfócitos B/genética , Receptores de Antígenos de Linfócitos T/genética , Linfócitos T/metabolismo
5.
Genome Biol ; 24(1): 120, 2023 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-37198601

RESUMO

Spatial transcriptomics maps gene expression across tissues, posing the challenge of determining the spatial arrangement of different cell types. However, spatial transcriptomics spots contain multiple cells. Therefore, the observed signal comes from mixtures of cells of different types. Here, we propose an innovative probabilistic model, Celloscope, that utilizes established prior knowledge on marker genes for cell type deconvolution from spatial transcriptomics data. Celloscope outperforms other methods on simulated data, successfully indicates known brain structures and spatially distinguishes between inhibitory and excitatory neuron types based in mouse brain tissue, and dissects large heterogeneity of immune infiltrate composition in prostate gland tissue.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Masculino , Animais , Camundongos , Neurônios , Encéfalo , Modelos Estatísticos
6.
Nat Commun ; 14(1): 982, 2023 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-36813776

RESUMO

Functional characterization of the cancer clones can shed light on the evolutionary mechanisms driving cancer's proliferation and relapse mechanisms. Single-cell RNA sequencing data provide grounds for understanding the functional state of cancer as a whole; however, much research remains to identify and reconstruct clonal relationships toward characterizing the changes in functions of individual clones. We present PhylEx that integrates bulk genomics data with co-occurrences of mutations from single-cell RNA sequencing data to reconstruct high-fidelity clonal trees. We evaluate PhylEx on synthetic and well-characterized high-grade serous ovarian cancer cell line datasets. PhylEx outperforms the state-of-the-art methods both when comparing capacity for clonal tree reconstruction and for identifying clones. We analyze high-grade serous ovarian cancer and breast cancer data to show that PhylEx exploits clonal expression profiles beyond what is possible with expression-based clustering methods and clear the way for accurate inference of clonal trees and robust phylo-phenotypic analysis of cancer.


Assuntos
Neoplasias Ovarianas , Árvores , Feminino , Humanos , Árvores/genética , Transcriptoma , Evolução Clonal , Recidiva Local de Neoplasia , Neoplasias Ovarianas/genética , Células Clonais , Análise de Célula Única/métodos
7.
Proc Natl Acad Sci U S A ; 120(1): e2209856120, 2023 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-36574653

RESUMO

Breast cancer (BC) is a complex disease comprising multiple distinct subtypes with different genetic features and pathological characteristics. Although a large number of antineoplastic compounds have been approved for clinical use, patient-to-patient variability in drug response is frequently observed, highlighting the need for efficient treatment prediction for individualized therapy. Several patient-derived models have been established lately for the prediction of drug response. However, each of these models has its limitations that impede their clinical application. Here, we report that the whole-tumor cell culture (WTC) ex vivo model could be stably established from all breast tumors with a high success rate (98 out of 116), and it could reassemble the parental tumors with the endogenous microenvironment. We observed strong clinical associations and predictive values from the investigation of a broad range of BC therapies with WTCs derived from a patient cohort. The accuracy was further supported by the correlation between WTC-based test results and patients' clinical responses in a separate validation study, where the neoadjuvant treatment regimens of 15 BC patients were mimicked. Collectively, the WTC model allows us to accomplish personalized drug testing within 10 d, even for small-sized tumors, highlighting its potential for individualized BC therapy. Furthermore, coupled with genomic and transcriptomic analyses, WTC-based testing can also help to stratify specific patient groups for assignment into appropriate clinical trials, as well as validate potential biomarkers during drug development.


Assuntos
Antineoplásicos , Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Perfilação da Expressão Gênica , Biomarcadores , Técnicas de Cultura de Células , Microambiente Tumoral
8.
PLoS Comput Biol ; 18(12): e1010732, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36469540

RESUMO

Identifying the interrelations among cancer driver genes and the patterns in which the driver genes get mutated is critical for understanding cancer. In this paper, we study cross-sectional data from cohorts of tumors to identify the cancer-type (or subtype) specific process in which the cancer driver genes accumulate critical mutations. We model this mutation accumulation process using a tree, where each node includes a driver gene or a set of driver genes. A mutation in each node enables its children to have a chance of mutating. This model simultaneously explains the mutual exclusivity patterns observed in mutations in specific cancer genes (by its nodes) and the temporal order of events (by its edges). We introduce a computationally efficient dynamic programming procedure for calculating the likelihood of our noisy datasets and use it to build our Markov Chain Monte Carlo (MCMC) inference algorithm, ToMExO. Together with a set of engineered MCMC moves, our fast likelihood calculations enable us to work with datasets with hundreds of genes and thousands of tumors, which cannot be dealt with using available cancer progression analysis methods. We demonstrate our method's performance on several synthetic datasets covering various scenarios for cancer progression dynamics. Then, a comparison against two state-of-the-art methods on a moderate-size biological dataset shows the merits of our algorithm in identifying significant and valid patterns. Finally, we present our analyses of several large biological datasets, including colorectal cancer, glioblastoma, and pancreatic cancer. In all the analyses, we validate the results using a set of method-independent metrics testing the causality and significance of the relations identified by ToMExO or competing methods.


Assuntos
Glioblastoma , Neoplasias , Criança , Humanos , Estudos Transversais , Neoplasias/genética , Neoplasias/patologia , Processos Neoplásicos , Algoritmos , Método de Monte Carlo , Mutação , Glioblastoma/genética
9.
Bioinformatics ; 38(5): 1235-1243, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34718417

RESUMO

MOTIVATION: DNA methylation plays a key role in a variety of biological processes. Recently, Nanopore long-read sequencing has enabled direct detection of these modifications. As a consequence, a range of computational methods have been developed to exploit Nanopore data for methylation detection. However, current approaches rely on a human-defined threshold to detect the methylation status of a genomic position and are not optimized to detect sites methylated at low frequency. Furthermore, most methods use either the Nanopore signals or the basecalling errors as the model input and do not take advantage of their combination. RESULTS: Here, we present DeepMP, a convolutional neural network-based model that takes information from Nanopore signals and basecalling errors to detect whether a given motif in a read is methylated or not. Besides, DeepMP introduces a threshold-free position modification calling model sensitive to sites methylated at low frequency across cells. We comprehensively benchmarked DeepMP against state-of-the-art methods on Escherichia coli, human and pUC19 datasets. DeepMP outperforms current approaches at read-based and position-based methylation detection across sites methylated at different frequencies in the three datasets. AVAILABILITY AND IMPLEMENTATION: DeepMP is implemented and freely available under MIT license at https://github.com/pepebonet/DeepMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Sequenciamento por Nanoporos , Nanoporos , Humanos , Software , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Escherichia coli/genética , DNA/genética
10.
PLoS Comput Biol ; 16(10): e1008183, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33035204

RESUMO

Identification of mutations of the genes that give cancer a selective advantage is an important step towards research and clinical objectives. As such, there has been a growing interest in developing methods for identification of driver genes and their temporal order within a single patient (intra-tumor) as well as across a cohort of patients (inter-tumor). In this paper, we develop a probabilistic model for tumor progression, in which the driver genes are clustered into several ordered driver pathways. We develop an efficient inference algorithm that exhibits favorable scalability to the number of genes and samples compared to a previously introduced ILP-based method. Adopting a probabilistic approach also allows principled approaches to model selection and uncertainty quantification. Using a large set of experiments on synthetic datasets, we demonstrate our superior performance compared to the ILP-based method. We also analyze two biological datasets of colorectal and glioblastoma cancers. We emphasize that while the ILP-based method puts many seemingly passenger genes in the driver pathways, our algorithm keeps focused on truly driver genes and outputs more accurate models for cancer progression.


Assuntos
Genes Neoplásicos/genética , Modelos Estatísticos , Neoplasias/genética , Neoplasias/patologia , Algoritmos , Biologia Computacional , Bases de Dados Genéticas , Progressão da Doença , Humanos , Mutação/genética
11.
Nat Commun ; 9(1): 2419, 2018 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-29925878

RESUMO

Intra-tumor heterogeneity is one of the biggest challenges in cancer treatment today. Here we investigate tissue-wide gene expression heterogeneity throughout a multifocal prostate cancer using the spatial transcriptomics (ST) technology. Utilizing a novel approach for deconvolution, we analyze the transcriptomes of nearly 6750 tissue regions and extract distinct expression profiles for the different tissue components, such as stroma, normal and PIN glands, immune cells and cancer. We distinguish healthy and diseased areas and thereby provide insight into gene expression changes during the progression of prostate cancer. Compared to pathologist annotations, we delineate the extent of cancer foci more accurately, interestingly without link to histological changes. We identify gene expression gradients in stroma adjacent to tumor regions that allow for re-stratification of the tumor microenvironment. The establishment of these profiles is the first step towards an unbiased view of prostate cancer and can serve as a dictionary for future studies.


Assuntos
Adenocarcinoma/genética , Regulação Neoplásica da Expressão Gênica , Neoplasias da Próstata/genética , Transcriptoma/genética , Adenocarcinoma/patologia , Adenocarcinoma/cirurgia , Biologia Computacional , Progressão da Doença , Perfilação da Expressão Gênica , Humanos , Masculino , Próstata/citologia , Próstata/patologia , Próstata/cirurgia , Prostatectomia , Neoplasias da Próstata/patologia , Neoplasias da Próstata/cirurgia , RNA Mensageiro/genética , Células Estromais/patologia , Microambiente Tumoral/genética
12.
J Clin Invest ; 128(4): 1355-1370, 2018 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-29480816

RESUMO

Metastatic breast cancers are still incurable. Characterizing the evolutionary landscape of these cancers, including the role of metastatic axillary lymph nodes (ALNs) in seeding distant organ metastasis, can provide a rational basis for effective treatments. Here, we have described the genomic analyses of the primary tumors and metastatic lesions from 99 samples obtained from 20 patients with breast cancer. Our evolutionary analyses revealed diverse spreading and seeding patterns that govern tumor progression. Although linear evolution to successive metastatic sites was common, parallel evolution from the primary tumor to multiple distant sites was also evident. Metastatic spreading was frequently coupled with polyclonal seeding, in which multiple metastatic subclones originated from the primary tumor and/or other distant metastases. Synchronous ALN metastasis, a well-established prognosticator of breast cancer, was not involved in seeding the distant metastasis, suggesting a hematogenous route for cancer dissemination. Clonal evolution coincided frequently with emerging driver alterations and evolving mutational processes, notably an increase in apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like-associated (APOBEC-associated) mutagenesis. Our data provide genomic evidence for a role of ALN metastasis in seeding distant organ metastasis and elucidate the evolving mutational landscape during cancer progression.


Assuntos
Neoplasias da Mama/genética , Evolução Molecular , Mutação , Neoplasias da Mama/mortalidade , Neoplasias da Mama/patologia , Feminino , Humanos , Linfonodos/metabolismo , Linfonodos/patologia , Metástase Linfática , Metástase Neoplásica
13.
Semin Cell Dev Biol ; 79: 123-130, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29146145

RESUMO

Cancer arises when pathways that control cell functions such as proliferation and migration are dysregulated to such an extent that cells start to divide uncontrollably and eventually spread throughout the body, ultimately endangering the survival of an affected individual. It is well established that somatic mutations are important in cancer initiation and progression as well as in creation of tumor diversity. Now also modifications of the transcriptome are emerging as a significant force during the transition from normal cell to malignant tumor. Editing of adenosine (A) to inosine (I) in double-stranded RNA, catalyzed by adenosine deaminases acting on RNA (ADARs), is one dynamic modification that in a combinatorial manner can give rise to a very diverse transcriptome. Since the cell interprets inosine as guanosine (G), editing can result in non-synonymous codon changes in transcripts as well as yield alternative splicing, but also affect targeting and disrupt maturation of microRNA. ADAR editing is essential for survival in mammals but its dysregulation can lead to cancer. ADAR1 is for instance overexpressed in, e.g., lung cancer, liver cancer, esophageal cancer and chronic myoelogenous leukemia, which with few exceptions promotes cancer progression. In contrast, ADAR2 is lowly expressed in e.g. glioblastoma, where the lower levels of ADAR2 editing leads to malignant phenotypes. Altogether, RNA editing by the ADAR enzymes is a powerful regulatory mechanism during tumorigenesis. Depending on the cell type, cancer progression seems to mainly be induced by ADAR1 upregulation or ADAR2 downregulation, although in a few cases ADAR1 is instead downregulated. In this review, we discuss how aberrant editing of specific substrates contributes to malignancy.


Assuntos
Adenosina Desaminase/metabolismo , Neoplasias/genética , Edição de RNA , RNA de Cadeia Dupla/genética , Proteínas de Ligação a RNA/metabolismo , Animais , Progressão da Doença , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/metabolismo , Neoplasias/patologia , Isoformas de RNA/genética , Isoformas de RNA/metabolismo , RNA de Cadeia Dupla/metabolismo
14.
PLoS Comput Biol ; 13(6): e1005556, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28586362

RESUMO

A complex disease has, by definition, multiple genetic causes. In theory, these causes could be identified individually, but their identification will likely benefit from informed use of anticipated interactions between causes. In addition, characterizing and understanding interactions must be considered key to revealing the etiology of any complex disease. Large-scale collaborative efforts are now paving the way for comprehensive studies of interaction. As a consequence, there is a need for methods with a computational efficiency sufficient for modern data sets as well as for improvements of statistical accuracy and power. Another issue is that, currently, the relation between different methods for interaction inference is in many cases not transparent, complicating the comparison and interpretation of results between different interaction studies. In this paper we present computationally efficient tests of interaction for the complete family of generalized linear models (GLMs). The tests can be applied for inference of single or multiple interaction parameters, but we show, by simulation, that jointly testing the full set of interaction parameters yields superior power and control of false positive rate. Based on these tests we also describe how to combine results from multiple independent studies of interaction in a meta-analysis. We investigate the impact of several assumptions commonly made when modeling interactions. We also show that, across the important class of models with a full set of interaction parameters, jointly testing the interaction parameters yields identical results. Further, we apply our method to genetic data for cardiovascular disease. This allowed us to identify a putative interaction involved in Lp(a) plasma levels between two 'tag' variants in the LPA locus (p = 2.42 ⋅ 10-09) as well as replicate the interaction (p = 6.97 ⋅ 10-07). Finally, our meta-analysis method is used in a small (N = 16,181) study of interactions in myocardial infarction.


Assuntos
Mapeamento Cromossômico/métodos , Epistasia Genética/genética , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Modelos Lineares , Modelos Genéticos , Algoritmos , Animais , Humanos , Modelos Teóricos
16.
BMC Bioinformatics ; 17(Suppl 14): 431, 2016 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-28185583

RESUMO

BACKGROUND: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge. RESULTS: In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify "highways" of LGT. CONCLUSIONS: Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.


Assuntos
Transferência Genética Horizontal/genética , Modelos Genéticos , Evolução Biológica , Entomoplasmataceae/classificação , Entomoplasmataceae/genética , Filogenia
17.
BMC Genomics ; 16 Suppl 10: S12, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26449131

RESUMO

Over the last decade, methods have been developed for the reconstruction of gene trees that take into account the species tree. Many of these methods have been based on the probabilistic duplication-loss model, which describes how a gene-tree evolves over a species-tree with respect to duplication and losses, as well as extension of this model, e.g., the DLRS (Duplication, Loss, Rate and Sequence evolution) model that also includes sequence evolution under relaxed molecular clock. A disjoint, almost as recent, and very important line of research has been focused on non protein-coding, but yet, functional DNA. For instance, DNA sequences being pseudogenes in the sense that they are not translated, may still be transcribed and the thereby produced RNA may be functional.


Assuntos
DNA/genética , Evolução Molecular , Filogenia , Pseudogenes/genética , Duplicação Gênica
18.
PLoS Genet ; 11(9): e1005502, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26402789

RESUMO

Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PROCARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Funções Verossimilhança , Humanos , Modelos Teóricos
19.
Syst Biol ; 64(6): 969-82, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26130236

RESUMO

Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts.


Assuntos
Classificação/métodos , Filogenia , Algoritmos , Simulação por Computador , Homologia de Sequência , Software
20.
Biochimie ; 117: 22-7, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26051678

RESUMO

It has for a long time been known that repetitive elements, particularly Alu sequences in human, are edited by the adenosine deaminases acting on RNA, ADAR, family. The functional interpretation of these events has been even more difficult than that of editing events in coding sequences, but today there is an emerging understanding of their downstream effects. A surprisingly large fraction of the human transcriptome contains inverted Alu repeats, often forming long double stranded structures in RNA transcripts, typically occurring in introns and UTRs of protein coding genes. Alu repeats are also common in other primates, and similar inverted repeats can frequently be found in non-primates, although the latter are less prone to duplex formation. In human, as many as 700,000 Alu elements have been identified as substrates for RNA editing, of which many are edited at several sites. In fact, recent advancements in transcriptome sequencing techniques and bioinformatics have revealed that the human editome comprises at least a hundred million adenosine to inosine (A-to-I) editing sites in Alu sequences. Although substantial additional efforts are required in order to map the editome, already present knowledge provides an excellent starting point for studying cis-regulation of editing. In this review, we will focus on editing of long stem loop structures in the human transcriptome and how it can effect gene expression.


Assuntos
Elementos Alu/genética , Regulação da Expressão Gênica , Edição de RNA , RNA não Traduzido/genética , Transcriptoma/genética , Animais , Humanos , Íntrons/genética , Modelos Genéticos , Primatas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA