Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Microb Genom ; 10(10)2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39401061

RESUMO

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform the public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This portal has been coupled with other resources, such as Viral AI, and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this portal (https://virusseq-dataportal.ca/), including its contextual data not available elsewhere, and the Duotang (https://covarr-net.github.io/duotang/duotang.html), a web platform that presents key genomic epidemiology and modelling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the portal (COVID-MVP, CoVizu), are all open source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.


Assuntos
COVID-19 , Genoma Viral , SARS-CoV-2 , Canadá/epidemiologia , SARS-CoV-2/genética , Humanos , COVID-19/epidemiologia , COVID-19/virologia , Genômica/métodos , Pandemias , Bases de Dados Genéticas
2.
mBio ; 15(8): e0090724, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-38953636

RESUMO

The continued evolution of severe acute respiratory syndrome 2 (SARS-CoV-2) requires persistent monitoring of its subvariants. Omicron subvariants are responsible for the vast majority of SARS-CoV-2 infections worldwide, with XBB and BA.2.86 sublineages representing more than 90% of circulating strains as of January 2024. To better understand parameters involved in viral transmission, we characterized the functional properties of Spike glycoproteins from BA.2.75, CH.1.1, DV.7.1, BA.4/5, BQ.1.1, XBB, XBB.1, XBB.1.16, XBB.1.5, FD.1.1, EG.5.1, HK.3, BA.2.86 and JN.1. We tested their capacity to evade plasma-mediated recognition and neutralization, binding to angiotensin-converting enzyme 2 (ACE2), their susceptibility to cold inactivation, Spike processing, as well as the impact of temperature on Spike-ACE2 interaction. We found that compared to the early wild-type (D614G) strain, most Omicron subvariants' Spike glycoproteins evolved to escape recognition and neutralization by plasma from individuals who received a fifth dose of bivalent (BA.1 or BA.4/5) mRNA vaccine and improve ACE2 binding, particularly at low temperatures. Moreover, BA.2.86 had the best affinity for ACE2 at all temperatures tested. We found that Omicron subvariants' Spike processing is associated with their susceptibility to cold inactivation. Intriguingly, we found that Spike-ACE2 binding at low temperature was significantly associated with growth rates of Omicron subvariants in humans. Overall, we report that Spikes from newly emerged Omicron subvariants are relatively more stable and resistant to plasma-mediated neutralization, present improved affinity for ACE2 which is associated, particularly at low temperatures, with their growth rates.IMPORTANCEThe persistent evolution of SARS-CoV-2 gave rise to a wide range of variants harboring new mutations in their Spike glycoproteins. Several factors have been associated with viral transmission and fitness such as plasma-neutralization escape and ACE2 interaction. To better understand whether additional factors could be of importance in SARS-CoV-2 variants' transmission, we characterize the functional properties of Spike glycoproteins from several Omicron subvariants. We found that the Spike glycoprotein of Omicron subvariants presents an improved escape from plasma-mediated recognition and neutralization, Spike processing, and ACE2 binding which was further improved at low temperature. Intriguingly, Spike-ACE2 interaction at low temperature is strongly associated with viral growth rate, as such, low temperatures could represent another parameter affecting viral transmission.


Assuntos
Enzima de Conversão de Angiotensina 2 , COVID-19 , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus , Temperatura , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismo , Enzima de Conversão de Angiotensina 2/metabolismo , Enzima de Conversão de Angiotensina 2/genética , Humanos , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , SARS-CoV-2/metabolismo , COVID-19/transmissão , COVID-19/virologia , Ligação Proteica , Anticorpos Neutralizantes/imunologia , Anticorpos Neutralizantes/sangue
3.
ArXiv ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38764594

RESUMO

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This Portal has been coupled with other resources like Viral AI and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this Portal, including its contextual data not available elsewhere, and the 'Duotang', a web platform that presents key genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

4.
Viruses ; 16(3)2024 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-38543708

RESUMO

Throughout the SARS-CoV-2 pandemic, several variants of concern (VOCs) have been identified, many of which share recurrent mutations in the spike glycoprotein's receptor-binding domain (RBD). This region coincides with known epitopes and can therefore have an impact on immune escape. Protracted infections in immunosuppressed patients have been hypothesized to lead to an enrichment of such mutations and therefore drive evolution towards VOCs. Here, we present the case of an immunosuppressed patient that developed distinct populations with immune escape mutations throughout the course of their infection. Notably, by investigating the co-occurrence of substitutions on individual sequencing reads in the RBD, we found quasispecies harboring mutations that confer resistance to known monoclonal antibodies (mAbs) such as S:E484K and S:E484A. These mutations were acquired without the patient being treated with mAbs nor convalescent sera and without them developing a detectable immune response to the virus. We also provide additional evidence for a viral reservoir based on intra-host phylogenetics, which led to a viral substrain that evolved elsewhere in the patient's body, colonizing their upper respiratory tract (URT). The presence of SARS-CoV-2 viral reservoirs can shed light on protracted infections interspersed with periods where the virus is undetectable, and potential explanations for long-COVID cases.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Síndrome de COVID-19 Pós-Aguda , Soroterapia para COVID-19 , Hospedeiro Imunocomprometido , Anticorpos Monoclonais , Mutação , Glicoproteína da Espícula de Coronavírus/genética , Anticorpos Antivirais , Anticorpos Neutralizantes
5.
Genome Biol Evol ; 16(1)2024 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-38207129

RESUMO

Cytochromes P450 (CYP450) are hemoproteins generally involved in the detoxification of the body of xenobiotic molecules. They participate in the metabolism of many drugs and genetic polymorphisms in humans have been found to impact drug responses and metabolic functions. In this study, we investigate the genetic diversity of CYP450 genes. We found that two clusters, CYP3A and CYP4F, are notably differentiated across human populations with evidence for selective pressures acting on both clusters: we found signals of recent positive selection in CYP3A and CYP4F genes and signals of balancing selection in CYP4F genes. Furthermore, an extensive amount of unusual linkage disequilibrium is detected in this latter cluster, indicating co-evolution signatures among CYP4F genes. Several of the selective signals uncovered co-localize with expression quantitative trait loci (eQTL), which could suggest epistasis acting on co-regulation in these gene families. In particular, we detected a potential co-regulation event between CYP3A5 and CYP3A43, a gene whose function remains poorly characterized. We further identified a causal relationship between CYP3A5 expression and reticulocyte count through Mendelian randomization analyses, potentially involving a regulatory region displaying a selective signal specific to African populations. Our findings linking natural selection and gene expression in CYP3A and CYP4F subfamilies are of importance in understanding population differences in metabolism of nutrients and drugs.


Assuntos
Citocromo P-450 CYP3A , Hominidae , Animais , Humanos , Citocromo P-450 CYP3A/genética , Citocromo P-450 CYP3A/metabolismo , Hominidae/metabolismo , Sistema Enzimático do Citocromo P-450/genética , Polimorfismo Genético , Seleção Genética
6.
iScience ; 26(12): 108473, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38077122

RESUMO

Metabolite genome-wide association studies (mGWAS) have advanced our understanding of the genetic control of metabolite levels. However, interpreting these associations remains challenging due to a lack of tools to annotate gene-metabolite pairs beyond the use of conservative statistical significance threshold. Here, we introduce the shortest reactional distance (SRD) metric, drawing from the comprehensive KEGG database, to enhance the biological interpretation of mGWAS results. We applied this approach to three independent mGWAS, including a case study on sickle cell disease patients. Our analysis reveals an enrichment of small SRD values in reported mGWAS pairs, with SRD values significantly correlating with mGWAS p values, even beyond the standard conservative thresholds. We demonstrate the utility of SRD annotation in identifying potential false negatives and inaccuracies within current metabolic pathway databases. Our findings highlight the SRD metric as an objective, quantitative and easy-to-compute annotation for gene-metabolite pairs, suitable to integrate statistical evidence to biological networks.

7.
Bioinform Adv ; 3(1): vbad097, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37720006

RESUMO

Summary: We describe the problem of computing local feature attributions for dimensionality reduction methods. We use one such method that is well established within the context of supervised classification-using the gradients of target outputs with respect to the inputs-on the popular dimensionality reduction technique t-SNE, widely used in analyses of biological data. We provide an efficient implementation for the gradient computation for this dimensionality reduction technique. We show that our explanations identify significant features using novel validation methodology; using synthetic datasets and the popular MNIST benchmark dataset. We then demonstrate the practical utility of our algorithm by showing that it can produce explanations that agree with domain knowledge on a SARS-CoV-2 sequence dataset. Throughout, we provide a road map so that similar explanation methods could be applied to other dimensionality reduction techniques to rigorously analyze biological datasets. Availability and implementation: We have created a Python package that can be installed using the following command: pip install interpretable_tsne. All code used can be found at github.com/MattScicluna/interpretable_tsne.

8.
iScience ; 26(8): 107394, 2023 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-37599818

RESUMO

Here, we exploit a deep serological profiling strategy coupled with an integrated, computational framework for the analysis of SARS-CoV-2 humoral immune responses. Applying a high-density peptide array (HDPA) spanning the entire proteomes of SARS-CoV-2 and endemic human coronaviruses allowed identification of B cell epitopes and relate them to their evolutionary and structural properties. We identify hotspots of pre-existing immunity and identify cross-reactive epitopes that contribute to increasing the overall humoral immune response to SARS-CoV-2. Using a public dataset of over 38,000 viral genomes from the early phase of the pandemic, capturing both inter- and within-host genetic viral diversity, we determined the evolutionary profile of epitopes and the differences across proteins, waves, and SARS-CoV-2 variants. Lastly, we show that mutations in spike and nucleocapsid epitopes are under stronger selection between than within patients, suggesting that most of the selective pressure for immune evasion occurs upon transmission between hosts.

9.
Elife ; 122023 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-37014792

RESUMO

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a generalist virus, infecting and evolving in numerous mammals, including captive and companion animals, free-ranging wildlife, and humans. Transmission among non-human species poses a risk for the establishment of SARS-CoV-2 reservoirs, makes eradication difficult, and provides the virus with opportunities for new evolutionary trajectories, including the selection of adaptive mutations and the emergence of new variant lineages. Here, we use publicly available viral genome sequences and phylogenetic analysis to systematically investigate the transmission of SARS-CoV-2 between human and non-human species and to identify mutations associated with each species. We found the highest frequency of animal-to-human transmission from mink, compared with lower transmission from other sampled species (cat, dog, and deer). Although inferred transmission events could be limited by sampling biases, our results provide a useful baseline for further studies. Using genome-wide association studies, no single nucleotide variants (SNVs) were significantly associated with cats and dogs, potentially due to small sample sizes. However, we identified three SNVs statistically associated with mink and 26 with deer. Of these SNVs, ~⅔ were plausibly introduced into these animal species from local human populations, while the remaining ~⅓ were more likely derived in animal populations and are thus top candidates for experimental studies of species-specific adaptation. Together, our results highlight the importance of studying animal-associated SARS-CoV-2 mutations to assess their potential impact on human and animal health.


Assuntos
COVID-19 , Cervos , Animais , Gatos , Cães , SARS-CoV-2/genética , COVID-19/genética , Filogenia , Vison/genética , Estudo de Associação Genômica Ampla , Cervos/genética , Zoonoses , Mutação , Genoma Viral
10.
bioRxiv ; 2023 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-36993181

RESUMO

Studies combining metabolomics and genetics, known as metabolite genome-wide association studies (mGWAS), have provided valuable insights into our understanding of the genetic control of metabolite levels. However, the biological interpretation of these associations remains challenging due to a lack of existing tools to annotate mGWAS gene-metabolite pairs beyond the use of conservative statistical significance threshold. Here, we computed the shortest reactional distance (SRD) based on the curated knowledge of the KEGG database to explore its utility in enhancing the biological interpretation of results from three independent mGWAS, including a case study on sickle cell disease patients. Results show that, in reported mGWAS pairs, there is an excess of small SRD values and that SRD values and p-values significantly correlate, even beyond the standard conservative thresholds. The added-value of SRD annotation is shown for identification of potential false negative hits, exemplified by the finding of gene-metabolite associations with SRD ≤1 that did not reach standard genome-wide significance cut-off. The wider use of this statistic as an mGWAS annotation would prevent the exclusion of biologically relevant associations and can also identify errors or gaps in current metabolic pathway databases. Our findings highlight the SRD metric as an objective, quantitative and easy-to-compute annotation for gene-metabolite pairs that can be used to integrate statistical evidence to biological networks.

11.
Genome Biol Evol ; 14(5)2022 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-35482036

RESUMO

The molecular mechanisms of aging and life expectancy have been studied in model organisms with short lifespans. However, long-lived species may provide insights into successful strategies for healthy aging, potentially opening the door for novel therapeutic interventions in age-related diseases. Notably, naked mole-rats, the longest-lived rodent, present attenuated aging phenotypes compared with mice. Their resistance toward oxidative stress has been proposed as one hallmark of their healthy aging, suggesting their ability to maintain cell homeostasis, specifically their protein homeostasis. To identify the general principles behind their protein homeostasis robustness, we compared the aggregation propensity and mutation tolerance of naked mole-rat and mouse orthologous proteins. Our analysis showed no proteome-wide differential effects in aggregation propensity and mutation tolerance between these species, but several subsets of proteins with a significant difference in aggregation propensity. We found an enrichment of proteins with higher aggregation propensity in naked mole-rat, and these are functionally involved in the inflammasome complex and nucleic acid binding. On the other hand, proteins with lower aggregation propensity in naked mole-rat have a significantly higher mutation tolerance compared with the rest of the proteins. Among them, we identified proteins known to be associated with neurodegenerative and age-related diseases. These findings highlight the intriguing hypothesis about the capacity of the naked mole-rat proteome to delay aging through its proteomic intrinsic architecture.


Assuntos
Agregados Proteicos , Proteômica , Animais , Longevidade/genética , Camundongos , Ratos-Toupeira/genética , Mutação , Proteoma/genética
12.
Front Med (Lausanne) ; 9: 826746, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35265640

RESUMO

The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19), has been sequenced at an unprecedented scale leading to a tremendous amount of viral genome sequencing data. To assist in tracing infection pathways and design preventive strategies, a deep understanding of the viral genetic diversity landscape is needed. We present here a set of genomic surveillance tools from population genetics which can be used to better understand the evolution of this virus in humans. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic. We analyzed 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets. This approach enables real-time lineage identification, a clear description of the relationship between variants of concern, and efficient detection of recurrent mutations. Furthermore, time series change of Tajima's D by haplotype provides a powerful metric of lineage expansion. Finally, principal component analysis (PCA) highlights key steps in variant emergence and facilitates the visualization of genomic variation in the context of SARS-CoV-2 diversity. The computational framework presented here is simple to implement and insightful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of populations of humans and other organisms.

13.
PLoS One ; 16(12): e0260714, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34855869

RESUMO

The first confirmed case of COVID-19 in Quebec, Canada, occurred at Verdun Hospital on February 25, 2020. A month later, a localized outbreak was observed at this hospital. We performed tiled amplicon whole genome nanopore sequencing on nasopharyngeal swabs from all SARS-CoV-2 positive samples from 31 March to 17 April 2020 in 2 local hospitals to assess viral diversity (unknown at the time in Quebec) and potential associations with clinical outcomes. We report 264 viral genomes from 242 individuals-both staff and patients-with associated clinical features and outcomes, as well as longitudinal samples and technical replicates. Viral lineage assessment identified multiple subclades in both hospitals, with a predominant subclade in the Verdun outbreak, indicative of hospital-acquired transmission. Dimensionality reduction identified two subclades with mutations of clinical interest, namely in the Spike protein, that evaded supervised lineage assignment methods-including Pangolin and NextClade supervised lineage assignment tools. We also report that certain symptoms (headache, myalgia and sore throat) are significantly associated with favorable patient outcomes. Our findings demonstrate the strength of unsupervised, data-driven analyses whilst suggesting that caution should be used when employing supervised genomic workflows, particularly during the early stages of a pandemic.


Assuntos
COVID-19/virologia , Infecção Hospitalar/virologia , Surtos de Doenças , Genoma Viral/genética , SARS-CoV-2/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , COVID-19/epidemiologia , COVID-19/mortalidade , Criança , Pré-Escolar , Infecção Hospitalar/epidemiologia , Surtos de Doenças/estatística & dados numéricos , Feminino , Haplótipos/genética , Humanos , Masculino , Pessoa de Meia-Idade , Filogenia , Quebeque/epidemiologia , SARS-CoV-2/patogenicidade , Análise de Sequência de RNA , Resultado do Tratamento , Adulto Jovem
14.
medRxiv ; 2021 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-34100030

RESUMO

The first confirmed case of COVID-19 in Quebec, Canada, occurred at Verdun Hospital on February 25, 2020. A month later, a localized outbreak was observed at this hospital. We performed tiled amplicon whole genome nanopore sequencing on nasopharyngeal swabs from all SARS-CoV-2 positive samples from 31 March to 17 April 2020 in 2 local hospitals to assess the viral diversity of the outbreak. We report 264 viral genomes from 242 individuals (both staff and patients) with associated clinical features and outcomes, as well as longitudinal samples, technical replicates and the first publicly disseminated SARS-CoV-2 genomes in Quebec. Viral lineage assessment identified multiple subclades in both hospitals, with a predominant subclade in the Verdun outbreak, indicative of hospital-acquired transmission. Dimensionality reduction identified two subclades that evaded supervised lineage assignment methods, including Pangolin, and identified certain symptoms (headache, myalgia and sore throat) that are significantly associated with favorable patient outcomes. We also address certain limitations of standard SARS-CoV-2 bioinformatics procedures, notably when presented with multiple viral haplotypes.

15.
BMC Evol Biol ; 19(1): 21, 2019 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-30634908

RESUMO

BACKGROUND: Multiple Sequence Alignments (MSAs) are the starting point of molecular evolutionary analyses. Errors in MSAs generate a non-historical signal that can lead to incorrect inferences. Therefore, numerous efforts have been made to reduce the impact of alignment errors, by improving alignment algorithms and by developing methods to filter out poorly aligned regions. However, MSAs do not only contain alignment errors, but also primary sequence errors. Such errors may originate from sequencing errors, from assembly errors, or from erroneous structural annotations (such as incorrect intron/exon boundaries). Even though their existence is acknowledged, the impact of primary sequence errors on evolutionary inference is poorly characterized. RESULTS: In a first step to fill this gap, we have developed a program called HmmCleaner, which detects and eliminates these errors from MSAs. It uses profile hidden Markov models (pHMM) to identify sequence segments that poorly fit their MSA and selectively removes them. We assessed its performances using > 700 amino-acid MSAs from prokaryotes and eukaryotes, in which we introduced several types of simulated primary sequence errors. The sensitivity of HmmCleaner towards simulated primary sequence errors was > 95%. In a second step, we compared the impact of segment filtering software (HmmCleaner and PREQUAL) relative to commonly used block-filtering software (BMGE and TrimAI) on evolutionary analyses. Using real data from vertebrates, we observed that segment-filtering methods improve the quality of evolutionary inference more than the currently used block-filtering methods. The formers were especially effective at improving branch length inferences, and at reducing false positive rate during detection of positive selection. CONCLUSIONS: Segment filtering methods such as HmmCleaner accurately detect simulated primary sequence errors. Our results suggest that these errors are more detrimental than alignment errors. However, they also show that stochastic (sampling) error is predominant in single-gene evolutionary inferences. Therefore, we argue that MSA filtering should focus on segment instead of block removal and that more studies are required to find the optimal balance between accuracy improvement and stochastic error increase brought by data removal.


Assuntos
Evolução Molecular , Alinhamento de Sequência , Algoritmos , Sequência de Aminoácidos , Sequência Conservada , Filogenia , Software
16.
Am J Hum Genet ; 95(5): 490-508, 2014 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-25307298

RESUMO

Neurodevelopmental disorders (NDDs) are caused by mutations in diverse genes involved in different cellular functions, although there can be crosstalk, or convergence, between molecular pathways affected by different NDDs. To assess molecular convergence, we generated human neural progenitor cell models of 9q34 deletion syndrome, caused by haploinsufficiency of EHMT1, and 18q21 deletion syndrome, caused by haploinsufficiency of TCF4. Using next-generation RNA sequencing, methylation sequencing, chromatin immunoprecipitation sequencing, and whole-genome miRNA analysis, we identified several levels of convergence. We found mRNA and miRNA expression patterns that were more characteristic of differentiating cells than of proliferating cells, and we identified CpG clusters that had similar methylation states in both models of reduced gene dosage. There was significant overlap of gene targets of TCF4 and EHMT1, whereby 8.3% of TCF4 gene targets and 4.2% of EHMT1 gene targets were identical. These data suggest that 18q21 and 9q34 deletion syndromes show significant molecular convergence but distinct expression and methylation profiles. Common intersection points might highlight the most salient features of disease and provide avenues for similar treatments for NDDs caused by different genetic mutations.


Assuntos
Fatores de Transcrição de Zíper de Leucina e Hélice-Alça-Hélix Básicos/genética , Transtornos Cromossômicos/genética , Anormalidades Craniofaciais/genética , Evolução Molecular , Haploinsuficiência/genética , Cardiopatias Congênitas/genética , Histona-Lisina N-Metiltransferase/genética , Deficiência Intelectual/genética , Células-Tronco Neurais , Fatores de Transcrição/genética , Células Cultivadas , Imunoprecipitação da Cromatina , Deleção Cromossômica , Cromossomos Humanos Par 18/genética , Cromossomos Humanos Par 9/genética , Metilação de DNA , Técnicas de Silenciamento de Genes , Humanos , Imuno-Histoquímica , MicroRNAs/genética , Microscopia Confocal , Reação em Cadeia da Polimerase em Tempo Real , Análise de Sequência de RNA , Fator de Transcrição 4
17.
BMC Genomics ; 15: 290, 2014 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-24734894

RESUMO

BACKGROUND: Bisulfite sequencing is the most efficient single nucleotide resolution method for analysis of methylation status at whole genome scale, but improved quality control metrics are needed to better standardize experiments. RESULTS: We describe BisQC, a step-by-step method for multiplexed bisulfite-converted DNA library construction, pooling, spike-in content, and bioinformatics. We demonstrate technical improvements for library preparation and bioinformatic analyses that can be done in standard laboratories. We find that decoupling amplification of bisulfite converted (bis) DNA from the indexing reaction is an advantage, specifically in reducing total PCR cycle number and pre-selecting high quality bis-libraries. We also introduce a progressive PCR method for optimal library amplification and size-selection. At the sequencing stage, we thoroughly test the benefits of pooling non-bis DNA library with bis-libraries and find that BisSeq libraries can be pooled with a high proportion of non-bis DNA libraries with minimal impact on BisSeq output. For informatics analysis, we propose a series of optimization steps including the utilization of the mitochondrial genome as a QC standard, and we assess the validity of using duplicate reads for coverage statistics. CONCLUSION: We demonstrate several quality control checkpoints at the library preparation, pre-sequencing, post-sequencing, and post-alignment stages, which should prove useful in determining sample and processing quality. We also determine that including a significant portion of non-bisulfite converted DNA with bisulfite converted DNA has a minimal impact on usable bisulfite read output.


Assuntos
Análise de Sequência de DNA/métodos , Sequência de Bases , Primers do DNA , Reação em Cadeia da Polimerase , Sulfitos
18.
Mol Biol Evol ; 28(1): 729-44, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20926596

RESUMO

The comparative approach is routinely used to test for possible correlations between phenotypic or life-history traits. To correct for phylogenetic inertia, the method of independent contrasts assumes that continuous characters evolve along the phylogeny according to a multivariate Brownian process. Brownian diffusion processes have also been used to describe time variations of the parameters of the substitution process, such as the rate of substitution or the ratio of synonymous to nonsynonymous substitutions. Here, we develop a probabilistic framework for testing the coupling between continuous characters and parameters of the molecular substitution process. Rates of substitution and continuous characters are jointly modeled as a multivariate Brownian diffusion process of unknown covariance matrix. The covariance matrix, the divergence times and the phylogenetic variations of substitution rates and continuous characters are all jointly estimated in a Bayesian Monte Carlo framework, imposing on the covariance matrix a prior conjugate to the Brownian process so as to achieve a greater computational efficiency. The coupling between rates and phenotypes is assessed by measuring the posterior probability of positive or negative covariances, whereas divergence dates and phenotypic variations are marginally reconstructed in the context of the joint analysis. As an illustration, we apply the model to a set of 410 mammalian cytochrome b sequences. We observe a negative correlation between the rate of substitution and mass and longevity, which was previously observed. We also find a positive correlation between ω = dN/dS and mass and longevity, which we interpret as an indirect effect of variations of effective population size, thus in partial agreement with the nearly neutral theory. The method can easily be extended to any parameter of the substitution process and to any continuous phenotypic or environmental character.


Assuntos
Evolução Biológica , Modelos Genéticos , Método de Monte Carlo , Fenótipo , Filogenia , Animais , Códon , Simulação por Computador , Evolução Molecular , Feminino , Cadeias de Markov , Matemática
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA