Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.307
Filtrar
1.
Postepy Biochem ; 65(3): 217-223, 2019 10 01.
Artigo em Polonês | MEDLINE | ID: mdl-31643169

RESUMO

Transposable elements (TEs) are the sequences that are able to "jump" across the genome. They are found in virtually all organisms including human. Although in human, the majority of TEs lost their ability to autonomous transposition, they make up almost half of our genome, and played important roles in genome evolution. Fast progress in deep sequencing and functional analysis has revealed the importance of domes­ticated copies of transposable elements, including their regulatory sequences, transcripts and proteins in normal cells functioning. However, a growing numer of evidence suggest the involvment of TEs in development and progression of autoimmune and neurodegenerative disaeses as well as in many types of cancer. In this review we summarize the current state of knowledge about the LTR retroelements: endogenous retroviruses (ERVs) and Ty3/Gypsy retrotransposons, and their role in human organism.


Assuntos
Genoma Humano/genética , Retroelementos/genética , Doenças Autoimunes/genética , Retrovirus Endógenos/genética , Evolução Molecular , Humanos , Neoplasias/genética , Doenças Neurodegenerativas/genética , Sequências Repetidas Terminais/genética
3.
Biochim Biophys Acta Rev Cancer ; 1872(1): 122-137, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31265877

RESUMO

The rapid evolution of next-generation sequencing (NGS)-based tumor genomic profile detection and the emergence of molecularly targeted therapies have enabled precision oncology. In NGS-based analysis, various types of databases have been developed to perform different functions. However, many problems still exist when using these public databases. Therefore, it is important to better understand the characteristics and limitations of each database and have them complement each other to provide useful clinical evidence for NGS testing. In this review, we elaborate on the important role of databases and their concrete applications in NGS-based somatic mutation detection. We introduce the typically used databases for sequence alignment, variant filtration, and variant interpretation, and compare the differences between the databases with similar functions. Subsequently, we determine the limitations of each database and provide the corresponding solutions. Furthermore, we present an overview diagram to clearly illustrate the database used in the entire NGS-based somatic mutation detection pipeline.


Assuntos
Análise Mutacional de DNA , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Genoma Humano/genética , Humanos , Mutação , Medicina de Precisão , Análise de Sequência de DNA/métodos
4.
Nat Commun ; 10(1): 3018, 2019 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-31289270

RESUMO

The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units for testing. Here we propose a scan statistic framework, WGScan, to simultaneously detect the existence, and estimate the locations of association signals at genome-wide scale. WGScan can analytically estimate the significance threshold for a whole-genome scan; utilize summary statistics for a meta-analysis; incorporate functional annotations for enhanced discoveries in noncoding regions; and enable enrichment analyses using genome-wide summary statistics. Based on the analysis of whole genomes of 1,786 phenotypically discordant sibling pairs from the Simons Simplex Collection study for autism spectrum disorders, we derive genome-wide significance thresholds for whole genome sequencing studies and detect significant enrichments of regions showing associations with autism in promoter regions, functional categories related to autism, and enhancers predicted to regulate expression of autism associated genes.


Assuntos
Transtorno do Espectro Autista/genética , Análise de Dados , Genoma Humano/genética , Modelos Genéticos , Algoritmos , Conjuntos de Dados como Assunto , Feminino , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla/métodos , Humanos , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único , Irmãos , Sequenciamento Completo do Genoma/métodos
5.
Biochim Biophys Acta Rev Cancer ; 1872(1): 60-65, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31152819

RESUMO

Hepatocellular carcinoma (HCC), the most common form of liver cancer, represents a health problem in hepatic viruses-eradicating era because obesity, type 2 diabetes, and nonalcoholic steatohepatitis (NASH) are considered emerging pathogenic factors. Metabolic disorders underpin mitotic errors that lead to numerical and structural chromosome aberrations in a significant proportion of cell divisions. Here, we review that genomically unstable HCCs show evidence for a paradoxically DNA damage response (DDR) which leads to ongoing chromosome segregation errors. The understanding of DDR induced by defective mitoses is crucial to our ability to develop or improve liver cancer therapeutic strategies.


Assuntos
Carcinoma Hepatocelular/genética , Genoma Humano/genética , Instabilidade Genômica/genética , Neoplasias Hepáticas/genética , Carcinoma Hepatocelular/patologia , Instabilidade Cromossômica/genética , Segregação de Cromossomos/genética , Dano ao DNA/genética , Humanos , Neoplasias Hepáticas/patologia , Mitose/genética
6.
Mol Biol (Mosk) ; 53(3): 355-366, 2019.
Artigo em Russo | MEDLINE | ID: mdl-31184600

RESUMO

A serious problem in the treatment of HIV infection is the emergence of drug-resistant forms of the virus. One promising approach to solving this problem is the development of inhibitors of the interaction between viral proteins with cellular co-factors. However, the development of this approach is hampered due to the lack of knowledge about the involvement of cellular proteins in the pathogenesis of HIV infection. In particular, it is known that the integration of viral DNA into the host genome generates numerous lesions in the cellular DNA, the repair of which is absolutely necessary for successful replication of the virus. However, it is still unknown which cellular proteins are involved in repairing this damage. In this review, we summarize what is known to date about the role of cellular repair systems in the replication of HIV-1 in general, and in the repair of damage that occurs during the integration of viral DNA into a cell's genome, in particular.


Assuntos
Reparo do DNA , DNA Viral , Genoma Humano/genética , Infecções por HIV/genética , Infecções por HIV/virologia , HIV-1/crescimento & desenvolvimento , HIV-1/genética , Replicação Viral , Dano ao DNA , Humanos
7.
Cancer Sci ; 110(8): 2620-2628, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31152682

RESUMO

Tumor mutational burden (TMB) and mutational signatures reflect the process of mutation accumulation in cancer. However, the significance of these emerging characteristics remains unclear. In the present study, we used whole-exome sequencing to analyze the TMB and mutational signature in solid tumors of 4046 Japanese patients. Eight predominant signatures-microsatellite instability, smoking, POLE, APOBEC, UV, mismatch repair, double-strand break repair, and Signature 16-were observed in tumors with TMB higher than 1.0 mutation/Mb, whereas POLE and UV signatures only showed moderate correlation with TMB, suggesting the extensive accumulation of mutations due to defective POLE and UV exposure. The contribution ratio of Signature 16, which is associated with hepatocellular carcinoma in drinkers, was increased in hypopharynx cancer. Tumors with predominant microsatellite instability signature were potential candidates for treatment with immune checkpoint inhibitors such as pembrolizumab and were found in 2.8% of cases. Furthermore, based on microarray analysis, tumors with predominant signatures were classified into 2 subgroups depending on the expression of immune-related genes reflecting differences in the immune context of the tumor microenvironment. Tumor subpopulations differing in the content of infiltrating immune cells might respond differently to immunotherapeutics. An understanding of cancer characteristics based on TMB and mutational signatures could provide new insights into mutation-driven tumorigenesis.


Assuntos
Carcinogênese/genética , Mutação/genética , Neoplasias/genética , Carcinogênese/patologia , Reparo de Erro de Pareamento de DNA/genética , Reparo do DNA/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Japão , Instabilidade de Microssatélites , Neoplasias/patologia , Carga Tumoral/genética , Microambiente Tumoral/genética , Sequenciamento Completo do Exoma/métodos
9.
Nat Commun ; 10(1): 2449, 2019 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-31164644

RESUMO

DNA base modifications, such as C5-methylcytosine (5mC) and N6-methyldeoxyadenosine (6mA), are important types of epigenetic regulations. Short-read bisulfite sequencing and long-read PacBio sequencing have inherent limitations to detect DNA modifications. Here, using raw electric signals of Oxford Nanopore long-read sequencing data, we design DeepMod, a bidirectional recurrent neural network (RNN) with long short-term memory (LSTM) to detect DNA modifications. We sequence a human genome HX1 and a Chlamydomonas reinhardtii genome using Nanopore sequencing, and then evaluate DeepMod on three types of genomes (Escherichia coli, Chlamydomonas reinhardtii and human genomes). For 5mC detection, DeepMod achieves average precision up to 0.99 for both synthetically introduced and naturally occurring modifications. For 6mA detection, DeepMod achieves ~0.9 average precision on Escherichia coli data, and have improved performance than existing methods on Chlamydomonas reinhardtii data. In conclusion, DeepMod performs well for genome-scale detection of DNA modifications and will facilitate epigenetic analysis on diverse species.


Assuntos
Chlamydomonas reinhardtii/genética , Metilação de DNA , Escherichia coli/genética , Genoma Bacteriano/genética , Genoma Humano/genética , Genoma de Planta/genética , Redes Neurais (Computação) , Bases de Dados de Ácidos Nucleicos , Epigênese Genética , Humanos , Nanoporos
10.
Mol Cell ; 74(5): 866-876, 2019 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-31173722

RESUMO

The replisome quickly and accurately copies billions of DNA bases each cell division cycle. However, it can make errors, especially when the template DNA is damaged. In these cases, replication-coupled repair mechanisms remove the mistake or repair the template lesions to ensure high fidelity and complete copying of the genome. Failures in these genome maintenance activities generate mutations, rearrangements, and chromosome segregation problems that cause many human diseases. In this review, I provide a broad overview of replication-coupled repair pathways, explaining how they fix polymerase mistakes, respond to template damage that acts as obstacles to the replisome, deal with broken forks, and impact human health and disease.


Assuntos
Reparo do DNA/genética , Replicação do DNA/genética , Doenças Genéticas Inatas/genética , Genoma Humano/genética , Ciclo Celular/genética , Segregação de Cromossomos/genética , Dano ao DNA/genética , Instabilidade Genômica/genética , Humanos , Mutação/genética
11.
Nature ; 571(7765): 413-418, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31243372

RESUMO

ABTRACT: Forkhead box A1 (FOXA1) is a pioneer transcription factor that is essential for the normal development of several endoderm-derived organs, including the prostate gland1,2. FOXA1 is frequently mutated in hormone-receptor-driven prostate, breast, bladder and salivary-gland tumours3-8. However, it is unclear how FOXA1 alterations affect the development of cancer, and FOXA1 has previously been ascribed both tumour-suppressive9-11 and oncogenic12-14 roles. Here we assemble an aggregate cohort of 1,546 prostate cancers and show that FOXA1 alterations fall into three structural classes that diverge in clinical incidence and genetic co-alteration profiles, with a collective prevalence of 35%. Class-1 activating mutations originate in early prostate cancer without alterations in ETS or SPOP, selectively recur within the wing-2 region of the DNA-binding forkhead domain, enable enhanced chromatin mobility and binding frequency, and strongly transactivate a luminal androgen-receptor program of prostate oncogenesis. By contrast, class-2 activating mutations are acquired in metastatic prostate cancers, truncate the C-terminal domain of FOXA1, enable dominant chromatin binding by increasing DNA affinity and-through TLE3 inactivation-promote metastasis driven by the WNT pathway. Finally, class-3 genomic rearrangements are enriched in metastatic prostate cancers, consist of duplications and translocations within the FOXA1 locus, and structurally reposition a conserved regulatory element-herein denoted FOXA1 mastermind (FOXMIND)-to drive overexpression of FOXA1 or other oncogenes. Our study reaffirms the central role of FOXA1 in mediating oncogenesis driven by the androgen receptor, and provides mechanistic insights into how the classes of FOXA1 alteration promote the initiation and/or metastatic progression of prostate cancer. These results have direct implications for understanding the pathobiology of other hormone-receptor-driven cancers and rationalize the co-targeting of FOXA1 activity in therapeutic strategies.


Assuntos
Fator 3-alfa Nuclear de Hepatócito/genética , Mutação/genética , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Linhagem Celular Tumoral , Cromatina/genética , Cromatina/metabolismo , Regulação Neoplásica da Expressão Gênica , Genoma Humano/genética , Fator 3-alfa Nuclear de Hepatócito/química , Fator 3-alfa Nuclear de Hepatócito/metabolismo , Humanos , Masculino , Modelos Moleculares , Metástase Neoplásica/genética , Domínios Proteicos , Receptores Androgênicos/metabolismo , Via de Sinalização Wnt
12.
Biomed Res Int ; 2019: 8420547, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31080831

RESUMO

Next generation sequencing is an emerging technology that has been widely used in the detection of genomic variants. However, since its depth of coverage, a main signature used for variant calling, is affected greatly by biases such as GC content and mappability, some callings are false positives. In this study, we utilized paired-end read mapping, another signature that is not affected by the aforementioned biases, to detect false-positive deletions in the database of genomic variants. We first identified 1923 suspicious variants that may be false positives and then conducted validation studies on each suspicious variant, which detected 583 false-positive deletions. Finally we analysed the distribution of these false positives by chromosome, sample, and size. Hopefully, incorrect documentation and annotations in downstream studies can be avoided by correcting these false positives in public repositories.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Humano/genética , Genômica , Deleção de Sequência , Composição de Bases , Mapeamento Cromossômico , Reações Falso-Positivas , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos
13.
Hum Genet ; 138(6): 661-672, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31069507

RESUMO

Tandem repeats (TRs) are widespread in the genomes of all living organisms. In eukaryotes, they are found in both coding and noncoding regions and have potential roles in the regulation of cellular processes such as transcription, translation and in the modification of protein structure. Recent studies have highlighted TRs as a key regulator of gene expression and a potential contributor to human evolution. Thus, TRs are emerging as an important source of variation that can result in differential gene expression at intra- and inter-species levels. In this study, we performed a genome-wide survey to identify TRs that have emerged in the human lineage. We further examined these loci to explore their potential functional significance for human evolution. We identified 152 human-specific TR (HSTR) loci containing a repeat unit of more than ten bases, with most of them showing a repeat count of two. Gene set enrichment analysis showed that HSTR-associated genes were associated with biological functions in brain development and synapse function. In addition, we compared gene expression of human HSTR loci with orthologues from non-human primates (NHP) in seven different tissues. Strikingly, the expression level of HSTR-associated genes in brain tissues was significantly higher in human than in NHP. These results suggest the possibility that de novo emergence of TRs could have resulted in altered gene expression in humans within a short-time frame and contributed to the rapid evolution of human brain function.


Assuntos
Encéfalo/metabolismo , Regulação da Expressão Gênica , Especificidade de Órgãos/genética , Sequências de Repetição em Tandem/genética , Animais , Sequência de Bases , Evolução Molecular , Genoma Humano/genética , Humanos , Taxa de Mutação , Primatas/genética , Homologia de Sequência do Ácido Nucleico
14.
Nat Commun ; 10(1): 2049, 2019 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-31053705

RESUMO

The new advances in various experimental techniques that provide complementary information about the spatial conformations of chromosomes have inspired researchers to develop computational methods to fully exploit the merits of individual data sources and combine them to improve the modeling of chromosome structure. Here we propose GEM-FISH, a method for reconstructing the 3D models of chromosomes through systematically integrating both Hi-C and FISH data with the prior biophysical knowledge of a polymer model. Comprehensive tests on a set of chromosomes, for which both Hi-C and FISH data are available, demonstrate that GEM-FISH can outperform previous chromosome structure modeling methods and accurately capture the higher order spatial features of chromosome conformations. Moreover, our reconstructed 3D models of chromosomes revealed interesting patterns of spatial distributions of super-enhancers which can provide useful insights into understanding the functional roles of these super-enhancers in gene regulation.


Assuntos
Cromossomos/química , Imagem Tridimensional/métodos , Modelos Moleculares , Conformação de Ácido Nucleico , Linhagem Celular , Cromatina/química , Cromatina/genética , Cromossomos/genética , Simulação por Computador , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos/genética , Genoma Humano/genética , Humanos , Hibridização in Situ Fluorescente/métodos
15.
Nat Cell Biol ; 21(5): 614-626, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31036939

RESUMO

Cell growth is controlled by a lysosomal signalling complex containing Rag small GTPases and mammalian target of rapamycin complex 1 (mTORC1) kinase. Here, we carried out a microscopy-based genome-wide human short interfering RNA screen and discovered a lysosome-localized G protein-coupled receptor (GPCR)-like protein, GPR137B, that interacts with Rag GTPases, increases Rag localization and activity, and thereby regulates mTORC1 translocation and activity. High GPR137B expression can recruit and activate mTORC1 in the absence of amino acids. Furthermore, GPR137B also regulates the dissociation of activated Rag from lysosomes, suggesting that GPR137B controls a cycle of Rag activation and dissociation from lysosomes. GPR137B-knockout cells exhibited defective autophagy and an expanded lysosome compartment, similar to Rag-knockout cells. Like zebrafish RagA mutants, GPR137B-mutant zebrafish had upregulated TFEB target gene expression and an expanded lysosome compartment in microglia. Thus, GPR137B is a GPCR-like lysosomal regulatory protein that controls dynamic Rag and mTORC1 localization and activity as well as lysosome morphology.


Assuntos
Fatores de Transcrição de Zíper de Leucina e Hélice-Alça-Hélix Básicos/genética , Genoma Humano/genética , Proteínas Monoméricas de Ligação ao GTP/genética , Receptores Acoplados a Proteínas-G/genética , Animais , Autofagia/genética , Regulação da Expressão Gênica/genética , Humanos , Lisossomos/genética , Alvo Mecanístico do Complexo 1 de Rapamicina/genética , Microglia/metabolismo , Complexos Multiproteicos/química , Complexos Multiproteicos/genética , RNA Interferente Pequeno/genética , Receptores Acoplados a Proteínas-G/antagonistas & inibidores , Peixe-Zebra/genética , Peixe-Zebra/crescimento & desenvolvimento
17.
Nat Commun ; 10(1): 2176, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-31092817

RESUMO

Streptococcus pneumoniae is a common nasopharyngeal colonizer, but can also cause life-threatening invasive diseases such as empyema, bacteremia and meningitis. Genetic variation of host and pathogen is known to play a role in invasive pneumococcal disease, though to what extent is unknown. In a genome-wide association study of human and pathogen we show that human variation explains almost half of variation in susceptibility to pneumococcal meningitis and one-third of variation in severity, identifying variants in CCDC33 associated with susceptibility. Pneumococcal genetic variation explains a large amount of invasive potential (70%), but has no effect on severity. Serotype alone is insufficient to explain invasiveness, suggesting other pneumococcal factors are involved in progression to invasive disease. We identify pneumococcal genes involved in invasiveness including pspC and zmpD, and perform a human-bacteria interaction analysis. These genes are potential candidates for the development of more broadly-acting pneumococcal vaccines.


Assuntos
Predisposição Genética para Doença , Meningite Pneumocócica/genética , Streptococcus pneumoniae/genética , Adulto , Idoso , Proteínas de Bactérias/genética , Feminino , Variação Genética , Genoma Bacteriano/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Interações Hospedeiro-Patógeno/genética , Humanos , Masculino , Meningite Pneumocócica/microbiologia , Pessoa de Meia-Idade , Estudos Prospectivos , Proteínas/genética , Streptococcus pneumoniae/isolamento & purificação
18.
Nat Commun ; 10(1): 2313, 2019 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-31127121

RESUMO

DNA double-strand breaks (DSBs) are among the most lethal types of DNA damage and frequently cause genome instability. Sequencing-based methods for mapping DSBs have been developed but they allow measurement only of relative frequencies of DSBs between loci, which limits our understanding of the physiological relevance of detected DSBs. Here we propose quantitative DSB sequencing (qDSB-Seq), a method providing both DSB frequencies per cell and their precise genomic coordinates. We induce spike-in DSBs by a site-specific endonuclease and use them to quantify detected DSBs (labeled, e.g., using i-BLESS). Utilizing qDSB-Seq, we determine numbers of DSBs induced by a radiomimetic drug and replication stress, and reveal two orders of magnitude differences in DSB frequencies. We also measure absolute frequencies of Top1-dependent DSBs at natural replication fork barriers. qDSB-Seq is compatible with various DSB labeling methods in different organisms and allows accurate comparisons of absolute DSB frequencies across samples.


Assuntos
Biologia Computacional/métodos , Quebras de DNA de Cadeia Dupla , Sequenciamento Completo do Genoma/métodos , Linhagem Celular Tumoral , Replicação do DNA/genética , DNA Topoisomerases Tipo I/metabolismo , Genoma Fúngico/genética , Genoma Humano/genética , Humanos , Saccharomycetales/genética
19.
Med Sci Monit ; 25: 2959-2965, 2019 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-31007253

RESUMO

BACKGROUND The aim of this study was to investigate the genomic alterations of renal cell carcinoma (RCC) in Chinese patients and to evaluate the correlations between significantly mutated genes and tumor mutation burden (TMB) levels in RCC. MATERIAL AND METHODS Two batch of specimens were collected from patients with RCC. Cohort 1 enrolled 17 RCC patients. Specimens and clinicopathological data were collected and the duration of disease-free survival were evaluated with a follow-up from 2 weeks to longer than 1 year. Cohort 2 collected 70 clear cell RCC (ccRCC) tissues and blood specimens. Next-generation sequencing were used to detect the genomic variations in those specimens in both cohorts and TMB in cohort 2. Clinicopathological features of the 2 cohorts were collected and the χ² test or Fisher's exact test was used for categorical variables stratified by TMB values. RESULTS Our present study demonstrated that the top 3 most frequent aberrated genes in Chinese ccRCC patients were ABCB1, UGT1A1, and VHL, with percentages of 50.00%, 42.86%, and 34.52% respectively. And only 1 gene, which was ABCB1, showed statistically significant difference (P=0.047) stratified by TMB levels. In addition, 6 oncogenic pathways were involved in ccRCC cases in the 2 cohorts. Only 5 out of the 8 most common altered genes of RCC from COSMIC or TCGA databases were detected in our study. CONCLUSIONS The genomic alterations of Chinese RCC patients were different from that in TCGA and COSMIC. No significant genomic alterations were found correlating to TMB levels in ccRCC. Non-silent mutation of VHL may be a predictor for the outcome of ccRCC treated with axitinib.


Assuntos
Carcinoma de Células Renais/genética , Regulação Neoplásica da Expressão Gênica/genética , Genoma Humano/genética , Subfamília B de Transportador de Cassetes de Ligação de ATP/genética , Subfamília B de Transportador de Cassetes de Ligação de ATP/metabolismo , Adulto , Idoso , Idoso de 80 Anos ou mais , Grupo com Ancestrais do Continente Asiático/genética , Carcinoma de Células Renais/patologia , China , Estudos de Coortes , Intervalo Livre de Doença , Feminino , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Glucuronosiltransferase/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias Renais/patologia , Masculino , Pessoa de Meia-Idade , Mutação/genética , Proteína Supressora de Tumor Von Hippel-Lindau/genética
20.
Nat Biotechnol ; 37(5): 561-566, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30936564

RESUMO

Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.


Assuntos
Benchmarking , Biologia Computacional/tendências , Genoma Humano/genética , Genômica/tendências , Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL/genética , Polimorfismo de Nucleotídeo Único , Software/tendências
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA