Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 190
Filtrar
1.
mSystems ; : e0032124, 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38742892

RESUMO

Ticks are increasingly important vectors of human and agricultural diseases. While many studies have focused on tick-borne bacteria, far less is known about tick-associated viruses and their roles in public health or tick physiology. To address this, we investigated patterns of bacterial and viral communities across two field populations of western black-legged ticks (Ixodes pacificus). Through metatranscriptomic analysis of 100 individual ticks, we quantified taxon prevalence, abundance, and co-occurrence with other members of the tick microbiome. In addition to commonly found tick-associated microbes, we assembled 11 novel RNA virus genomes from Rhabdoviridae, Chuviridae, Picornaviridae, Phenuiviridae, Reoviridae, Solemovidiae, Narnaviridae and two highly divergent RNA virus genomes lacking sequence similarity to any known viral families. We experimentally verified the presence of these in I. pacificus ticks across several life stages. We also unexpectedly identified numerous virus-like transcripts that are likely encoded by tick genomic DNA, and which are distinct from known endogenous viral element-mediated immunity pathways in invertebrates. Taken together, our work reveals that I. pacificus ticks carry a greater diversity of viruses than previously appreciated, in some cases resulting in evolutionarily acquired virus-like transcripts. Our findings highlight how pervasive and intimate tick-virus interactions are, with major implications for both the fundamental biology and vectorial capacity of I. pacificus ticks. IMPORTANCE: Ticks are increasingly important vectors of disease, particularly in the United States where expanding tick ranges and intrusion into previously wild areas has resulted in increasing human exposure to ticks. Emerging human pathogens have been identified in ticks at an increasing rate, and yet little is known about the full community of microbes circulating in various tick species, a crucial first step to understanding how they interact with each and their tick host, as well as their ability to cause disease in humans. We investigated the bacterial and viral communities of the Western blacklegged tick in California and found 11 previously uncharacterized viruses circulating in this population.

2.
bioRxiv ; 2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-38045231

RESUMO

The investigation of chromatin organization in single cells holds great promise for identifying causal relationships between genome structure and function. However, analysis of single-molecule data is hampered by extreme yet inherent heterogeneity, making it challenging to determine the contributions of individual chromatin fibers to bulk trends. To address this challenge, we propose ChromaFactor, a novel computational approach based on non-negative matrix factorization that deconvolves single-molecule chromatin organization datasets into their most salient primary components. ChromaFactor provides the ability to identify trends accounting for the maximum variance in the dataset while simultaneously describing the contribution of individual molecules to each component. Applying our approach to two single-molecule imaging datasets across different genomic scales, we find that these primary components demonstrate significant correlation with key functional phenotypes, including active transcription, enhancer-promoter distance, and genomic compartment. ChromaFactor offers a robust tool for understanding the complex interplay between chromatin structure and function on individual DNA molecules, pinpointing which subpopulations drive functional changes and fostering new insights into cellular heterogeneity and its implications for bulk genomic phenomena.

3.
bioRxiv ; 2023 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-38045412

RESUMO

The most prevalent microbial eukaryote in the human gut is Blastocystis, an obligate commensal protist also common in many other vertebrates. Blastocystis is descended from free-living stramenopile ancestors; how it has adapted to thrive within humans and a wide range of hosts is unclear. Here, we cultivated six Blastocystis strains spanning the diversity of the genus and generated highly contiguous, annotated genomes with long-read DNA-seq, Hi-C, and RNA-seq. Comparative genomics between these strains and two closely related stramenopiles with different lifestyles, the lizard gut symbiont Proteromonas lacertae and the free-living marine flagellate Cafeteria burkhardae, reveal the evolutionary history of the Blastocystis genus. We find substantial gene content variability between Blastocystis strains. Blastocystis isolated from an herbivorous tortoise has many plant carbohydrate metabolizing enzymes, some horizontally acquired from bacteria, likely reflecting fermentation within the host gut. In contrast, human-isolated Blastocystis have gained many heat shock proteins, and we find numerous subtype-specific expansions of host-interfacing genes, including cell adhesion and cell surface glycan genes. In addition, we observe that human-isolated Blastocystis have substantial changes in gene structure, including shortened introns and intergenic regions, as well as genes lacking canonical termination codons. Finally, our data indicate that the common ancestor of Blastocystis lost nearly all ancestral genes for heterokont flagella morphology, including cilia proteins, microtubule motor proteins, and ion channel proteins. Together, these findings underscore the huge functional variability within the Blastocystis genus and provide candidate genes for the adaptations these lineages have undergone to thrive in the gut microbiomes of diverse vertebrates.

4.
bioRxiv ; 2023 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-37961120

RESUMO

Phenotypic divergence between closely related species, including bonobos and chimpanzees (genus Pan), is largely driven by variation in gene regulation. The 3D structure of the genome mediates gene expression; however, genome folding differences in Pan are not well understood. Here, we apply machine learning to predict genome-wide 3D genome contact maps from DNA sequence for 56 bonobos and chimpanzees, encompassing all five extant lineages. We use a pairwise approach to estimate 3D divergence between individuals from the resulting contact maps in 4,420 1 Mb genomic windows. While most pairs were similar, ∼17% were predicted to be substantially divergent in genome folding. The most dissimilar maps were largely driven by single individuals with rare variants that produce unique 3D genome folding in a region. We also identified 89 genomic windows where bonobo and chimpanzee contact maps substantially diverged, including several windows harboring genes associated with traits implicated in Pan phenotypic divergence. We used in silico mutagenesis to identify 51 3D-modifying variants in these bonobo-chimpanzee divergent windows, finding that 34 or 66.67% induce genome folding changes via CTCF binding motif disruption. Our results reveal 3D genome variation at the population-level and identify genomic regions where changes in 3D folding may contribute to phenotypic differences in our closest living relatives.

5.
bioRxiv ; 2023 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-37961123

RESUMO

Computationally editing genome sequences is a common bioinformatics task, but current approaches have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing in silico mutagenesis. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences.

6.
medRxiv ; 2023 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-37961381

RESUMO

In frontotemporal lobar degeneration (FTLD), pathological protein aggregation is associated with a decline in human-specialized social-emotional and language functions. Most disease protein aggregates contain either TDP-43 (FTLD-TDP) or tau (FTLD-tau). Here, we explored whether FTLD targets brain regions that express genes containing human accelerated regions (HARs), conserved sequences that have undergone positive selection during recent human evolution. To this end, we used structural neuroimaging from patients with FTLD and normative human regional transcriptomic data to identify genes expressed in FTLD-targeted brain regions. We then integrated primate comparative genomic data to test our hypothesis that FTLD targets brain regions expressing recently evolved genes. In addition, we asked whether genes expressed in FTLD-targeted brain regions are enriched for genes that undergo cryptic splicing when TDP-43 function is impaired. We found that FTLD-TDP and FTLD-tau subtypes target brain regions that express overlapping and distinct genes, including many linked to neuromodulatory functions. Genes whose normative brain regional expression pattern correlated with FTLD cortical atrophy were strongly associated with HARs. Atrophy-correlated genes in FTLD-TDP showed greater overlap with TDP-43 cryptic splicing genes compared with atrophy-correlated genes in FTLD-tau. Cryptic splicing genes were enriched for HAR genes, and vice versa, but this effect was due to the confounding influence of gene length. Analyses performed at the individual-patient level revealed that the expression of HAR genes and cryptically spliced genes within putative regions of disease onset differed across FTLD-TDP subtypes. Overall, our findings suggest that FTLD targets brain regions that have undergone recent evolutionary specialization and provide intriguing potential leads regarding the transcriptomic basis for selective vulnerability in distinct FTLD molecular-anatomical subtypes.

7.
bioRxiv ; 2023 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-37961712

RESUMO

Recent studies have highlighted the impact of both transcription and transcripts on 3D genome organization, particularly its dynamics. Here, we propose a deep learning framework, called AkitaR, that leverages both genome sequences and genome-wide RNA-DNA interactions to investigate the roles of chromatin-associated RNAs (caRNAs) on genome folding in HFFc6 cells. In order to disentangle the cis- and trans-regulatory roles of caRNAs, we compared models with nascent transcripts, trans-located caRNAs, open chromatin data, or DNA sequence alone. Both nascent transcripts and trans-located caRNAs improved the models' predictions, especially at cell-type-specific genomic regions. Analyses of feature importance scores revealed the contribution of caRNAs at TAD boundaries, chromatin loops and nuclear sub-structures such as nuclear speckles and nucleoli to the models' predictions. Furthermore, we identified non-coding RNAs (ncRNAs) known to regulate chromatin structures, such as MALAT1 and NEAT1, as well as several novel RNAs, RNY5, RPPH1, POLG-DT and THBS1-IT, that might modulate chromatin architecture through trans-interactions in HFFc6. Our modeling also suggests that transcripts from Alus and other repetitive elements may facilitate chromatin interactions through trans R-loop formation. Our findings provide new insights and generate testable hypotheses about the roles of caRNAs in shaping chromatin organization.

8.
Cell Genom ; 3(10): 100410, 2023 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-37868032

RESUMO

Natural and experimental genetic variants can modify DNA loops and insulating boundaries to tune transcription, but it is unknown how sequence perturbations affect chromatin organization genome wide. We developed a deep-learning strategy to quantify the effect of any insertion, deletion, or substitution on chromatin contacts and systematically scored millions of synthetic variants. While most genetic manipulations have little impact, regions with CTCF motifs and active transcription are highly sensitive, as expected. Our unbiased screen and subsequent targeted experiments also point to noncoding RNA genes and several families of repetitive elements as CTCF-motif-free DNA sequences with particularly large effects on nearby chromatin interactions, sometimes exceeding the effects of CTCF sites and explaining interactions that lack CTCF. We anticipate that our disruption tracks may be of broad interest and utility as a measure of 3D genome sensitivity, and our computational strategies may serve as a template for biological inquiry with deep learning.

9.
Genome Biol ; 24(1): 186, 2023 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-37563669

RESUMO

Existing single nucleotide polymorphism (SNP) genotyping algorithms do not scale for species with thousands of sequenced strains, nor do they account for conspecific redundancy. Here we present a bioinformatics tool, Maast, which empowers population genetic meta-analysis of microbes at an unrivaled scale. Maast implements a novel algorithm to heuristically identify a minimal set of diverse conspecific genomes, then constructs a reliable SNP panel for each species, and enables rapid and accurate genotyping using a hybrid of whole-genome alignment and k-mer exact matching. We demonstrate Maast's utility by genotyping thousands of Helicobacter pylori strains and tracking SARS-CoV-2 diversification.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Genótipo , SARS-CoV-2/genética , Genoma , Algoritmos , Polimorfismo de Nucleotídeo Único , Técnicas de Genotipagem
10.
Mol Cell ; 83(15): 2624-2640, 2023 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-37419111

RESUMO

The four-dimensional nucleome (4DN) consortium studies the architecture of the genome and the nucleus in space and time. We summarize progress by the consortium and highlight the development of technologies for (1) mapping genome folding and identifying roles of nuclear components and bodies, proteins, and RNA, (2) characterizing nuclear organization with time or single-cell resolution, and (3) imaging of nuclear organization. With these tools, the consortium has provided over 2,000 public datasets. Integrative computational models based on these data are starting to reveal connections between genome structure and function. We then present a forward-looking perspective and outline current aims to (1) delineate dynamics of nuclear architecture at different timescales, from minutes to weeks as cells differentiate, in populations and in single cells, (2) characterize cis-determinants and trans-modulators of genome organization, (3) test functional consequences of changes in cis- and trans-regulators, and (4) develop predictive models of genome structure and function.


Assuntos
Núcleo Celular , Genoma , Genoma/genética , Núcleo Celular/genética , Núcleo Celular/metabolismo , Cromatina/metabolismo
11.
Res Sq ; 2023 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-37292728

RESUMO

Comparing chromatin contact maps is an essential step in quantifying how three-dimensional (3D) genome organization shapes development, evolution, and disease. However, no gold standard exists for comparing contact maps, and even simple methods often disagree. In this study, we propose novel comparison methods and evaluate them alongside existing approaches using genome-wide Hi-C data and 22,500 in silico predicted contact maps. We also quantify the robustness of methods to common sources of biological and technical variation, such as boundary size and noise. We find that simple difference-based methods such as mean squared error are suitable for initial screening, but biologically informed methods are necessary to identify why maps diverge and propose specific functional hypotheses. We provide a reference guide, codebase, and benchmark for rapidly comparing chromatin contact maps at scale to enable biological insights into the 3D organization of the genome.

12.
Nat Commun ; 14(1): 3510, 2023 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-37316519

RESUMO

Microbial community function depends on both taxonomic composition and spatial organization. While composition of the human gut microbiome has been deeply characterized, less is known about the organization of microbes between regions such as lumen and mucosa and the microbial genes regulating this organization. Using a defined 117 strain community for which we generate high-quality genome assemblies, we model mucosa/lumen organization with in vitro cultures incorporating mucin hydrogel carriers as surfaces for bacterial attachment. Metagenomic tracking of carrier cultures reveals increased diversity and strain-specific spatial organization, with distinct strains enriched on carriers versus liquid supernatant, mirroring mucosa/lumen enrichment in vivo. A comprehensive search for microbial genes associated with this spatial organization identifies candidates with known adhesion-related functions, as well as novel links. These findings demonstrate that carrier cultures of defined communities effectively recapitulate fundamental aspects of gut spatial organization, enabling identification of key microbial strains and genes.


Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , Microbioma Gastrointestinal/genética , Hidrogéis , Metagenoma , Microbiota/genética , Mucinas
13.
Elife ; 122023 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-37306300

RESUMO

Bacteria within the gut microbiota possess the ability to metabolize a wide array of human drugs, foods, and toxins, but the responsible enzymes for these chemical events remain largely uncharacterized due to the time-consuming nature of current experimental approaches. Attempts have been made in the past to computationally predict which bacterial species and enzymes are responsible for chemical transformations in the gut environment, but with low accuracy due to minimal chemical representation and sequence similarity search schemes. Here, we present an in silico approach that employs chemical and protein Similarity algorithms that Identify MicrobioMe Enzymatic Reactions (SIMMER). We show that SIMMER accurately predicts the responsible species and enzymes for a queried reaction, unlike previous methods. We demonstrate SIMMER use cases in the context of drug metabolism by predicting previously uncharacterized enzymes for 88 drug transformations known to occur in the human gut. We validate these predictions on external datasets and provide an in vitro validation of SIMMER's predictions for metabolism of methotrexate, an anti-arthritic drug. After demonstrating its utility and accuracy, we made SIMMER available as both a command-line and web tool, with flexible input and output options for determining chemical transformations within the human gut. We present SIMMER as a computational addition to the microbiome researcher's toolbox, enabling them to make informed hypotheses before embarking on the lengthy laboratory experiments required to characterize novel bacterial enzymes that can alter human ingested compounds.


Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , Bactérias/metabolismo , Alimentos , Algoritmos
14.
bioRxiv ; 2023 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-37066196

RESUMO

Comparing chromatin contact maps is an essential step in quantifying how three-dimensional (3D) genome organization shapes development, evolution, and disease. However, no gold standard exists for comparing contact maps, and even simple methods often disagree. In this study, we propose novel comparison methods and evaluate them alongside existing approaches using genome-wide Hi-C data and 22,500 in silico predicted contact maps. We also quantify the robustness of methods to common sources of biological and technical variation, such as boundary size and noise. We find that simple difference-based methods such as mean squared error are suitable for initial screening, but biologically informed methods are necessary to identify why maps diverge and propose specific functional hypotheses. We provide a reference guide, codebase, and benchmark for rapidly comparing chromatin contact maps at scale to enable biological insights into the 3D organization of the genome.

15.
Science ; 380(6643): eabm1696, 2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37104607

RESUMO

Human accelerated regions (HARs) are conserved genomic loci that evolved at an accelerated rate in the human lineage and may underlie human-specific traits. We generated HARs and chimpanzee accelerated regions with an automated pipeline and an alignment of 241 mammalian genomes. Combining deep learning with chromatin capture experiments in human and chimpanzee neural progenitor cells, we discovered a significant enrichment of HARs in topologically associating domains containing human-specific genomic variants that change three-dimensional (3D) genome organization. Differential gene expression between humans and chimpanzees at these loci suggests rewiring of regulatory interactions between HARs and neurodevelopmental genes. Thus, comparative genomics together with models of 3D genome folding revealed enhancer hijacking as an explanation for the rapid evolution of HARs.


Assuntos
Loci Gênicos , Neurogênese , Animais , Humanos , Cromatina/genética , Genoma Humano , Genômica , Pan troglodytes/genética , Neurogênese/genética , Aprendizado Profundo
16.
Science ; 380(6643): eabn2937, 2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37104612

RESUMO

Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.


Assuntos
Doença , Variação Genética , Animais , Humanos , Evolução Biológica , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Doença/genética
17.
Circ Genom Precis Med ; 16(3): 207-215, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37017090

RESUMO

BACKGROUND: A large proportion of genetic risk remains unexplained for structural heart disease involving the interventricular septum (IVS) including hypertrophic cardiomyopathy and ventricular septal defects. This study sought to develop a reproducible proxy of IVS structure from standard medical imaging, discover novel genetic determinants of IVS structure, and relate these loci to diseases of the IVS, hypertrophic cardiomyopathy, and ventricular septal defect. METHODS: We estimated the cross-sectional area of the IVS from the 4-chamber view of cardiac magnetic resonance imaging in 32 219 individuals from the UK Biobank which was used as the basis of genome wide association studies and Mendelian randomization. RESULTS: Measures of IVS cross-sectional area at diastole were a strong proxy for the 3-dimensional volume of the IVS (Pearson r=0.814, P=0.004), and correlated with anthropometric measures, blood pressure, and diagnostic codes related to cardiovascular physiology. Seven loci with clear genomic consequence and relevance to cardiovascular biology were uncovered by genome wide association studies, most notably a single nucleotide polymorphism in an intron of CDKN1A (rs2376620; ß, 7.7 mm2 [95% CI, 5.8-11.0]; P=6.0×10-10), and a common inversion incorporating KANSL1 predicted to disrupt local chromatin structure (ß, 8.4 mm2 [95% CI, 6.3-10.9]; P=4.2×10-14). Mendelian randomization suggested that inheritance of larger IVS cross-sectional area at diastole was strongly associated with hypertrophic cardiomyopathy risk (pIVW=4.6×10-10) while inheritance of smaller IVS cross-sectional area at diastole was associated with risk for ventricular septal defect (pIVW=0.007). CONCLUSIONS: Automated estimates of cross-sectional area of the IVS supports discovery of novel loci related to cardiac development and Mendelian disease. Inheritance of genetic liability for either small or large IVS, appears to confer risk for ventricular septal defect or hypertrophic cardiomyopathy, respectively. These data suggest that a proportion of risk for structural and congenital heart disease can be localized to the common genetic determinants of size and shape of cardiovascular anatomy.


Assuntos
Cardiomiopatia Hipertrófica , Comunicação Interventricular , Humanos , Estudo de Associação Genômica Ampla , Cardiomiopatia Hipertrófica/diagnóstico por imagem , Cardiomiopatia Hipertrófica/genética , Cardiomiopatia Hipertrófica/complicações , Comunicação Interventricular/diagnóstico por imagem , Comunicação Interventricular/genética , Comunicação Interventricular/complicações , Coração , Imageamento por Ressonância Magnética
18.
Circ Genom Precis Med ; 16(3): 258-266, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37026454

RESUMO

BACKGROUND: Congenital heart disease (CHD) is highly heritable, but the power to identify inherited risk has been limited to analyses of common variants in small cohorts. METHODS: We performed reimputation of 4 CHD cohorts (n=55 342) to the TOPMed reference panel (freeze 5), permitting meta-analysis of 14 784 017 variants including 6 035 962 rare variants of high imputation quality as validated by whole genome sequencing. RESULTS: Meta-analysis identified 16 novel loci, including 12 rare variants, which displayed moderate or large effect sizes (median odds ratio, 3.02) for 4 separate CHD categories. Analyses of chromatin structure link 13 of the genome-wide significant loci to key genes in cardiac development; rs373447426 (minor allele frequency, 0.003 [odds ratio, 3.37 for Conotruncal heart disease]; P=1.49×10-8) is predicted to disrupt chromatin structure for 2 nearby genes BDH1 and DLG1 involved in Conotruncal development. A lead variant rs189203952 (minor allele frequency, 0.01 [odds ratio, 2.4 for left ventricular outflow tract obstruction]; P=1.46×10-8) is predicted to disrupt the binding sites of 4 transcription factors known to participate in cardiac development in the promoter of SPAG9. A tissue-specific model of chromatin conformation suggests that common variant rs78256848 (minor allele frequency, 0.11 [odds ratio, 1.4 for Conotruncal heart disease]; P=2.6×10-8) physically interacts with NCAM1 (PFDR=1.86×10-27), a neural adhesion molecule acting in cardiac development. Importantly, while each individual malformation displayed substantial heritability (observed h2 ranging from 0.26 for complex malformations to 0.37 for left ventricular outflow tract obstructive disease) the risk for different CHD malformations appeared to be separate, without genetic correlation measured by linkage disequilibrium score regression or regional colocalization. CONCLUSIONS: We describe a set of rare noncoding variants conferring significant risk for individual heart malformations which are linked to genes governing cardiac development. These results illustrate that the oligogenic basis of CHD and significant heritability may be linked to rare variants outside protein-coding regions conferring substantial risk for individual categories of cardiac malformation.


Assuntos
Cardiopatias Congênitas , Humanos , Cardiopatias Congênitas/diagnóstico , Cardiopatias Congênitas/genética , Fenótipo , Frequência do Gene , Sequenciamento Completo do Genoma , Cromatina , Proteínas Adaptadoras de Transdução de Sinal/genética
19.
STAR Protoc ; 4(1): 101964, 2023 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-36856771

RESUMO

Genotyping single-nucleotide polymorphisms (SNPs) in microbiomes enables strain-level quantification. In this protocol, we describe a computational pipeline that performs fast and accurate SNP genotyping using metagenomic data. We first demonstrate how to use Maast to catalog SNPs from microbial genomes. Then we use GT-Pro to extract unique SNP-covering k-mers, optimize a data structure for storing these k-mers, and finally perform metagenotyping. For proof of concept, the protocol leverages public whole-genome sequences to metagenotype a synthetic community. For complete details on the use and execution of this protocol, please refer to Shi et al. (2022a)1 and Shi et al. (2022b).2.


Assuntos
Genoma , Microbiota , Microbiota/genética , Polimorfismo de Nucleotídeo Único/genética
20.
bioRxiv ; 2023 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-36945512

RESUMO

Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA