Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 57
Filter
1.
IET Image Process ; 15(11): 2604-2613, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34226836

ABSTRACT

At the end of 2019, a novel coronavirus COVID-19 broke out. Due to its high contagiousness, more than 74 million people have been infected worldwide. Automatic segmentation of the COVID-19 lesion area in CT images is an effective auxiliary medical technology which can quantitatively diagnose and judge the severity of the disease. In this paper, a multi-class COVID-19 CT image segmentation network is proposed, which includes a pyramid attention module to extract multi-scale contextual attention information, and a residual convolution module to improve the discriminative ability of the network. A wavelet edge loss function is also proposed to extract edge features of the lesion area to improve the segmentation accuracy. For the experiment, a dataset of 4369 CT slices is constructed, including three symptoms: ground glass opacities, interstitial infiltrates, and lung consolidation. The dice similarity coefficients of three symptoms of the model achieve 0.7704, 0.7900, 0.8241 respectively. The performance of the proposed network on public dataset COVID-SemiSeg is also evaluated. The results demonstrate that this model outperforms other state-of-the-art methods and can be a powerful tool to assist in the diagnosis of positive infection cases, and promote the development of intelligent technology in the medical field.

2.
Optik (Stuttg) ; 241: 167100, 2021 Sep.
Article in English | MEDLINE | ID: mdl-33976457

ABSTRACT

Since discovered in Hubei, China in December 2019, Corona Virus Disease 2019 named COVID-19 has lasted more than one year, and the number of new confirmed cases and confirmed deaths is still at a high level. COVID-19 is an infectious disease caused by SARS-CoV-2. Although RT-PCR is considered the gold standard for detection of COVID-19, CT plays an important role in the diagnosis and evaluation of the therapeutic effect of COVID-19. Diagnosis and localization of COVID-19 on CT images using deep learning can provide quantitative auxiliary information for doctors. This article proposes a novel network with multi-receptive field attention module to diagnose COVID-19 on CT images. This attention module includes three parts, a pyramid convolution module (PCM), a multi-receptive field spatial attention block (SAB), and a multi-receptive field channel attention block (CAB). The PCM can improve the diagnostic ability of the network for lesions of different sizes and shapes. The role of SAB and CAB is to focus the features extracted from the network on the lesion area to improve the ability of COVID-19 discrimination and localization. We verify the effectiveness of the proposed method on two datasets. The accuracy rate of 97.12%, specificity of 96.89%, and sensitivity of 97.21% are achieved by the proposed network on DTDB dataset provided by the Beijing Ditan Hospital Capital Medical University. Compared with other state-of-the-art attention modules, the proposed method achieves better result. As for the public COVID-19 SARS-CoV-2 dataset, 95.16% for accuracy, 95.6% for F1-score and 99.01% for AUC are obtained. The proposed network can effectively assist doctors in the diagnosis of COVID-19 CT images.

3.
Mol Genet Genomic Med ; 8(11): e1488, 2020 11.
Article in English | MEDLINE | ID: mdl-32961042

ABSTRACT

BACKGROUND: Current copy number variation (CNV) identification methods have rapidly become mature. However, the postdetection processes such as variant interpretation or reporting are inefficient. To overcome this situation, we developed REDBot as an automated software package for accurate and direct generation of clinical diagnostic reports for prenatal and products of conception (POC) samples. METHODS: We applied natural language process (NLP) methods for analyzing 30,235 in-house historical clinical reports through active learning, and then, developed clinical knowledge bases, evidence-based interpretation methods and reporting criteria to support the whole postdetection pipeline. RESULTS: Of the 30,235 reports, we obtained 37,175 CNV-paragraph pairs. For these pairs, the active learning approaches achieved a 0.9466 average F1-score in sentence classification. The overall accuracy for variant classification was 95.7%, 95.2%, and 100.0% in retrospective, prospective, and clinical utility experiments, respectively. CONCLUSION: By integrating NLP methods in CNVs postdetection pipeline, REDBot is a robust and rapid tool with clinical utility for prenatal and POC diagnosis.


Subject(s)
DNA Copy Number Variations , Electronic Health Records , Genetic Testing/methods , Natural Language Processing , Prenatal Diagnosis/methods , Software , Humans
4.
PLoS One ; 15(8): e0236285, 2020.
Article in English | MEDLINE | ID: mdl-32841250

ABSTRACT

Characterizing meiotic recombination rates across the genomes of nonhuman primates is important for understanding the genetics of primate populations, performing genetic analyses of phenotypic variation and reconstructing the evolution of human recombination. Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primates in biomedical research. We constructed a high-resolution genetic map of the rhesus genome based on whole genome sequence data from Indian-origin rhesus macaques. The genetic markers used were approximately 18 million SNPs, with marker density 6.93 per kb across the autosomes. We report that the genome-wide recombination rate in rhesus macaques is significantly lower than rates observed in apes or humans, while the distribution of recombination across the macaque genome is more uniform. These observations provide new comparative information regarding the evolution of recombination in primates.


Subject(s)
Evolution, Molecular , Macaca mulatta/genetics , Meiosis/genetics , Recombination, Genetic , Animals , Chromosome Mapping , Genetic Markers , Genetic Variation , Genome , Humans , Polymorphism, Single Nucleotide , Species Specificity , Whole Genome Sequencing
5.
RSC Adv ; 10(51): 30944-30952, 2020 Aug 17.
Article in English | MEDLINE | ID: mdl-35516031

ABSTRACT

In the emerging field of laser-driven inertial confinement fusion, Joule heating generated via electromagnetic heating of the metal frame is a critical issue. However, there are few reported models explaining thermal damage to the aluminum alloy. The aim of this study was to build a coupled model for electromagnetic radiation and heat conversion of an ultrashort laser pulse on an aluminum alloy based on Ohm's law. Additionally, the application SiO2 films on aluminum alloy to improve the laser-induced damage threshold (LIDT) were simulated, and the effects of metal impurities in the aluminum alloy were analyzed. A model examining the relation between electromagnetic radiation and heat for a nanosecond laser irradiating an aluminum alloy was developed using a coupled model equation. The results obtained using the finite difference time domain (FDTD) algorithm can provide a theoretical basis for future improvement of the aluminum alloy LIDT.

6.
IEEE Access ; 8: 185786-185795, 2020.
Article in English | MEDLINE | ID: mdl-34812359

ABSTRACT

Since the first patient reported in December 2019, 2019 novel coronavirus disease (COVID-19) has become global pandemic with more than 10 million total confirmed cases and 500 thousand related deaths. Using deep learning methods to quickly identify COVID-19 and accurately segment the infected area can help control the outbreak and assist in treatment. Computed tomography (CT) as a fast and easy clinical method, it is suitable for assisting in diagnosis and treatment of COVID-19. According to clinical manifestations, COVID-19 lung infection areas can be divided into three categories: ground-glass opacities, interstitial infiltrates and consolidation. We proposed a multi-scale discriminative network (MSD-Net) for multi-class segmentation of COVID-19 lung infection on CT. In the MSD-Net, we proposed pyramid convolution block (PCB), channel attention block (CAB) and residual refinement block (RRB). The PCB can increase the receptive field by using different numbers and different sizes of kernels, which strengthened the ability to segment the infected areas of different sizes. The CAB was used to fusion the input of the two stages and focus features on the area to be segmented. The role of RRB was to refine the feature maps. Experimental results showed that the dice similarity coefficient (DSC) of the three infection categories were 0.7422,0.7384,0.8769 respectively. For sensitivity and specificity, the results of three infection categories were (0.8593, 0.9742), (0.8268,0.9869) and (0.8645,0.9889) respectively. The experimental results demonstrated that the network proposed in this paper can effectively segment the COVID-19 infection on CT images. It can be adopted for assisting in diagnosis and treatment of COVID-19.

7.
Mol Psychiatry ; 25(2): 476-490, 2020 02.
Article in English | MEDLINE | ID: mdl-31673123

ABSTRACT

Tourette syndrome (TS) is a childhood-onset neuropsychiatric disorder characterized by repetitive motor movements and vocal tics. The clinical manifestations of TS are complex and often overlap with other neuropsychiatric disorders. TS is highly heritable; however, the underlying genetic basis and molecular and neuronal mechanisms of TS remain largely unknown. We performed whole-exome sequencing of a hundred trios (probands and their parents) with detailed records of their clinical presentations and identified a risk gene, ASH1L, that was both de novo mutated and associated with TS based on a transmission disequilibrium test. As a replication, we performed follow-up targeted sequencing of ASH1L in additional 524 unrelated TS samples and replicated the association (P value = 0.001). The point mutations in ASH1L cause defects in its enzymatic activity. Therefore, we established a transgenic mouse line and performed an array of anatomical, behavioral, and functional assays to investigate ASH1L function. The Ash1l+/- mice manifested tic-like behaviors and compulsive behaviors that could be rescued by the tic-relieving drug haloperidol. We also found that Ash1l disruption leads to hyper-activation and elevated dopamine-releasing events in the dorsal striatum, all of which could explain the neural mechanisms for the behavioral abnormalities in mice. Taken together, our results provide compelling evidence that ASH1L is a TS risk gene.


Subject(s)
DNA-Binding Proteins/genetics , Histone-Lysine N-Methyltransferase/genetics , Tourette Syndrome/genetics , Adolescent , Adult , Animals , Child , Child, Preschool , China , DNA-Binding Proteins/metabolism , Family , Female , Genetic Predisposition to Disease/genetics , Histone-Lysine N-Methyltransferase/metabolism , Humans , Male , Mice , Mice, Transgenic , Middle Aged , Mutation/genetics , Parents , Tic Disorders/genetics , Tourette Syndrome/complications , Transcription Factors/genetics , Exome Sequencing/methods
8.
Genet Med ; 21(9): 1998-2006, 2019 09.
Article in English | MEDLINE | ID: mdl-30828085

ABSTRACT

PURPOSE: To assess the clinical performance of an expanded noninvasive prenatal screening (NIPS) test ("NIPS-Plus") for detection of both aneuploidy and genome-wide microdeletion/microduplication syndromes (MMS). METHODS: A total of 94,085 women with a singleton pregnancy were prospectively enrolled in the study. The cell-free plasma DNA was directly sequenced without intermediate amplification and fetal abnormalities identified using an improved copy-number variation (CNV) calling algorithm. RESULTS: A total of 1128 pregnancies (1.2%) were scored positive for clinically significant fetal chromosome abnormalities. This comprised 965 aneuploidies (1.026%) and 163 (0.174%) MMS. From follow-up tests, the positive predictive values (PPVs) for T21, T18, T13, rare trisomies, and sex chromosome aneuploidies were calculated as 95%, 82%, 46%, 29%, and 47%, respectively. For known MMS (n = 32), PPVs were 93% (DiGeorge), 68% (22q11.22 microduplication), 75% (Prader-Willi/Angleman), and 50% (Cri du Chat). For the remaining genome-wide MMS (n = 88), combined PPVs were 32% (CNVs ≥10 Mb) and 19% (CNVs <10 Mb). CONCLUSION: NIPS-Plus yielded high PPVs for common aneuploidies and DiGeorge syndrome, and moderate PPVs for other MMS. Our results present compelling evidence that NIPS-Plus can be used as a first-tier pregnancy screening method to improve detection rates of clinically significant fetal chromosome abnormalities.


Subject(s)
Cell-Free Nucleic Acids/genetics , Chromosome Aberrations , Chromosome Disorders/diagnosis , Noninvasive Prenatal Testing/methods , Adolescent , Adult , Aneuploidy , Chromosome Disorders/genetics , Chromosome Disorders/pathology , DNA Copy Number Variations/genetics , Female , Humans , Karyotyping , Middle Aged , Pregnancy , Prenatal Diagnosis , Risk Factors , Sex Chromosome Aberrations , Trisomy/genetics , Young Adult
9.
Science ; 361(6409)2018 09 28.
Article in English | MEDLINE | ID: mdl-30139913

ABSTRACT

To assess the impact of genetic variation in regulatory loci on human health, we constructed a high-resolution map of allelic imbalances in DNA methylation, histone marks, and gene transcription in 71 epigenomes from 36 distinct cell and tissue types from 13 donors. Deep whole-genome bisulfite sequencing of 49 methylomes revealed sequence-dependent CpG methylation imbalances at thousands of heterozygous regulatory loci. Such loci are enriched for stochastic switching, which is defined as random transitions between fully methylated and unmethylated states of DNA. The methylation imbalances at thousands of loci are explainable by different relative frequencies of the methylated and unmethylated states for the two alleles. Further analyses provided a unifying model that links sequence-dependent allelic imbalances of the epigenome, stochastic switching at gene regulatory loci, and disease-associated genetic variation.


Subject(s)
Allelic Imbalance , DNA Methylation , Disease/genetics , Epigenesis, Genetic , Genome, Human , Polymorphism, Single Nucleotide , Alleles , Binding Sites , CpG Islands , Gene Regulatory Networks , Genetic Loci , Genome-Wide Association Study , Humans , Sequence Analysis, DNA , Sulfites/chemistry , Transcription Factors/metabolism
10.
Circ Genom Precis Med ; 11(7): e002099, 2018 07.
Article in English | MEDLINE | ID: mdl-29997225

ABSTRACT

BACKGROUND: Intracranial aneurysm (IA) is usually a late-onset disease, affecting 1% to 3% of the general population and leading to life-threatening subarachnoid hemorrhage. Genetic susceptibility has been implicated in IAs, but the causative genes remain elusive. METHODS: We performed next-generation sequencing in a discovery cohort of 20 Chinese IA patients. Bioinformatics filters were exploited to search for candidate deleterious variants with rare and low allele frequency. We further examined the candidate variants in a multiethnic sample collection of 86 whole exome sequenced unsolved familial IA cases from 3 previously published studies. RESULTS: We identified that the low-frequency variant c.4394C>A_p.Ala1465Asp (rs2298808) of ARHGEF17 was significantly associated with IA in our Chinese discovery cohort (P=7.3×10-4; odds ratio=7.34). It was subsequently replicated in Japanese familial IA patients (P=0.039; odds ratio=4.00; 95% confidence interval=0.832-14.8) and was associated with IA in the large Chinese sample collection comprising 832 sporadic IA-affected and 599 control individuals (P=0.041; odds ratio=1.51; 95% confidence interval=1.02-Inf). When combining the sequencing data of all familial IA patients from 4 different ethnicities (ie, Chinese, Japanese, European American, and French-Canadian), we identified a significantly increased mutation burden for ARHGEF17 (21/106 versus 11/306; P=8.1×10-7; odds ratio=6.6; 95% confidence interval=2.9-15.8) in cases as compared with controls. In zebrafish, arhgef17 was highly expressed in the brain blood vessel. arhgef17 knockdown caused blood extravasation in the brain region. Endothelial lesions were identified exclusively on cerebral blood vessels in the arhgef17-deficient zebrafish. CONCLUSIONS: Our results provide compelling evidence that ARHGEF17 is a risk gene for IA.


Subject(s)
Exome , Genetic Predisposition to Disease , Intracranial Aneurysm/genetics , Rho Guanine Nucleotide Exchange Factors/genetics , Subarachnoid Hemorrhage/genetics , Adult , Alleles , Canada , Cohort Studies , Female , Gene Frequency , Humans , Male , Middle Aged , Risk Factors
11.
Am J Obstet Gynecol ; 219(3): 287.e1-287.e18, 2018 09.
Article in English | MEDLINE | ID: mdl-29852155

ABSTRACT

BACKGROUND: Next-generation sequencing is emerging as a viable alternative to chromosome microarray analysis for the diagnosis of chromosome disease syndromes. One next-generation sequencing methodology, copy number variation sequencing, has been shown to deliver high reliability, accuracy, and reproducibility for detection of fetal copy number variations in prenatal samples. However, its clinical utility as a first-tier diagnostic method has yet to be demonstrated in a large cohort of pregnant women referred for fetal chromosome testing. OBJECTIVE: We sought to evaluate copy number variation sequencing as a first-tier diagnostic method for detection of fetal chromosome anomalies in a general population of pregnant women with high-risk prenatal indications. STUDY DESIGN: This was a prospective analysis of 3429 pregnant women referred for amniocentesis and fetal chromosome testing for different risk indications, including advanced maternal age, high-risk maternal serum screening, and positivity for an ultrasound soft marker. Amniocentesis was performed by standard procedures. Amniocyte DNA was analyzed by copy number variation sequencing with a chromosome resolution of 0.1 Mb. Fetal chromosome anomalies including whole chromosome aneuploidy and segmental imbalances were independently confirmed by gold standard cytogenetic and molecular methods and their pathogenicity determined following guidelines of the American College of Medical Genetics for sequence variants. RESULTS: Clear interpretable copy number variation sequencing results were obtained for all 3429 amniocentesis samples. Copy number variation sequencing identified 3293 samples (96%) with a normal molecular karyotype and 136 samples (4%) with an altered molecular karyotype. A total of 146 fetal chromosome anomalies were detected, comprising 46 whole chromosome aneuploidies (pathogenic), 29 submicroscopic microdeletions/microduplications with known or suspected associations with chromosome disease syndromes (pathogenic), 22 other microdeletions/microduplications (likely pathogenic), and 49 variants of uncertain significance. Overall, the cumulative frequency of pathogenic/likely pathogenic and variants of uncertain significance chromosome anomalies in the patient cohort was 2.83% and 1.43%, respectively. In the 3 high-risk advanced maternal age, high-risk maternal serum screening, and ultrasound soft marker groups, the most common whole chromosome aneuploidy detected was trisomy 21, followed by sex chromosome aneuploidies, trisomy 18, and trisomy 13. Across all clinical indications, there was a similar incidence of submicroscopic copy number variations, with approximately equal proportions of pathogenic/likely pathogenic and variants of uncertain significance copy number variations. If karyotyping had been used as an alternate cytogenetics detection method, copy number variation sequencing would have returned a 1% higher yield of pathogenic or likely pathogenic copy number variations. CONCLUSION: In a large prospective clinical study, copy number variation sequencing delivered high reliability and accuracy for identifying clinically significant fetal anomalies in prenatal samples. Based on key performance criteria, copy number variation sequencing appears to be a well-suited methodology for first-tier diagnosis of pregnant women in the general population at risk of having a suspected fetal chromosome abnormality.


Subject(s)
Chromosome Disorders/diagnosis , DNA Copy Number Variations/genetics , Adult , Amniocentesis , Aneuploidy , China , Chromosome Aberrations , Chromosome Disorders/genetics , Down Syndrome/diagnosis , Female , High-Throughput Nucleotide Sequencing , Humans , In Situ Hybridization, Fluorescence , Karyotyping , Microarray Analysis , Pregnancy , Prenatal Diagnosis , Prospective Studies , Sequence Analysis, DNA , Sex Chromosome Aberrations , Trisomy 13 Syndrome/diagnosis , Trisomy 18 Syndrome/diagnosis
12.
Am J Hum Genet ; 102(5): 731-743, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29706352

ABSTRACT

Large-scale, population-based genomic studies have provided a context for modern medical genetics. Among such studies, however, African populations have remained relatively underrepresented. The breadth of genetic diversity across the African continent argues for an exploration of local genomic context to facilitate burgeoning disease mapping studies in Africa. We sought to characterize genetic variation and to assess population substructure within a cohort of HIV-positive children from Botswana-a Southern African country that is regionally underrepresented in genomic databases. Using whole-exome sequencing data from 164 Batswana and comparisons with 150 similarly sequenced HIV-positive Ugandan children, we found that 13%-25% of variation observed among Batswana was not captured by public databases. Uncaptured variants were significantly enriched (p = 2.2 × 10-16) for coding variants with minor allele frequencies between 1% and 5% and included predicted-damaging non-synonymous variants. Among variants found in public databases, corresponding allele frequencies varied widely, with Botswana having significantly higher allele frequencies among rare (<1%) pathogenic and damaging variants. Batswana clustered with other Southern African populations, but distinctly from 1000 Genomes African populations, and had limited evidence for admixture with extra-continental ancestries. We also observed a surprising lack of genetic substructure in Botswana, despite multiple tribal ethnicities and language groups, alongside a higher degree of relatedness than purported founder populations from the 1000 Genomes project. Our observations reveal a complex, but distinct, ancestral history and genomic architecture among Batswana and suggest that disease mapping within similar Southern African populations will require a deeper repository of genetic variation and allelic dependencies than presently exists.


Subject(s)
Black People/genetics , Exome Sequencing , Genetic Variation , Botswana , Cohort Studies , Gene Pool , Genetics, Population , Genome, Human , Geography , Humans , Phylogeny , Principal Component Analysis
13.
BMC Genomics ; 18(1): 396, 2017 05 22.
Article in English | MEDLINE | ID: mdl-28532386

ABSTRACT

BACKGROUND: The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. RESULTS: South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. CONCLUSIONS: Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.


Subject(s)
Asian People/genetics , Metagenomics , Whole Genome Sequencing , Genetic Variation , Genome, Mitochondrial/genetics , Humans
14.
Hum Mutat ; 38(6): 669-677, 2017 06.
Article in English | MEDLINE | ID: mdl-28247551

ABSTRACT

Detailed characterization of chromosomal abnormalities, a common cause for congenital abnormalities and pregnancy loss, is critical for elucidating genes for human fetal development. Here, 2,186 product-of-conception samples were tested for copy-number variations (CNVs) at two clinical diagnostic centers using whole-genome sequencing and high-resolution chromosomal microarray analysis. We developed a new gene discovery approach to predict potential developmental genes and identified 275 candidate genes from CNVs detected from both datasets. Based on Mouse Genome Informatics (MGI) and Zebrafish model organism database (ZFIN), 75% of identified genes could lead to developmental defects when mutated. Genes involved in embryonic development, gene transcription, and regulation of biological processes were significantly enriched. Especially, transcription factors and gene families sharing specific protein domains predominated, which included known developmental genes such as HOX, NKX homeodomain genes, and helix-loop-helix containing HAND2, NEUROG2, and NEUROD1 as well as potential novel developmental genes. We observed that developmental genes were denser in certain chromosomal regions, enabling identification of 31 potential genomic loci with clustered genes associated with development.


Subject(s)
Chromosome Aberrations , Chromosome Disorders/genetics , Embryonic Development/genetics , Transcription Factors/genetics , Animals , Chromosome Disorders/pathology , DNA Copy Number Variations/genetics , Female , Genome, Human , Humans , Mice , Microarray Analysis , Pregnancy , Zebrafish/genetics
15.
Appl Opt ; 56(4): 816-822, 2017 Feb 01.
Article in English | MEDLINE | ID: mdl-28158081

ABSTRACT

The radiation force of a high-energy laser caused by reflection at the input surface of a mounted KH2PO4 (KDP) crystal is studied, along with its effects on the second-harmonic generation (SHG) efficiency of the laser beam. A comprehensive model incorporating principles of momentum transfer, mechanics, and optics is proposed, taking advantage of which, the mechanical stress within the KDP crystal that is caused by the radiation force, and the SHG efficiency that is affected by the stress are successively studied. Moreover, the effects of the intensity of the laser beam on the radiation force, the stress, and the SHG efficiency are determined, respectively. It demonstrates that a high-energy laser beam causes macroscopic radiation force and further contributes negative effects to SHG efficiency.

16.
Am J Hum Genet ; 100(2): 205-215, 2017 02 02.
Article in English | MEDLINE | ID: mdl-28089252

ABSTRACT

Whole-genome sequencing (WGS) allows for a comprehensive view of the sequence of the human genome. We present and apply integrated methodologic steps for interrogating WGS data to characterize the genetic architecture of 10 heart- and blood-related traits in a sample of 1,860 African Americans. In order to evaluate the contribution of regulatory and non-protein coding regions of the genome, we conducted aggregate tests of rare variation across the entire genomic landscape using a sliding window, complemented by an annotation-based assessment of the genome using predefined regulatory elements and within the first intron of all genes. These tests were performed treating all variants equally as well as with individual variants weighted by a measure of predicted functional consequence. Significant findings were assessed in 1,705 individuals of European ancestry. After these steps, we identified and replicated components of the genomic landscape significantly associated with heart- and blood-related traits. For two traits, lipoprotein(a) levels and neutrophil count, aggregate tests of low-frequency and rare variation were significantly associated across multiple motifs. For a third trait, cardiac troponin T, investigation of regulatory domains identified a locus on chromosome 9. These practical approaches for WGS analysis led to the identification of informative genomic regions and also showed that defined non-coding regions, such as first introns of genes and regulatory domains, are associated with important risk factor phenotypes. This study illustrates the tractable nature of WGS data and outlines an approach for characterizing the genetic architecture of complex traits.


Subject(s)
Black or African American/genetics , Genome-Wide Association Study , Lipoprotein(a)/genetics , Troponin T/genetics , C-Reactive Protein/metabolism , Cholesterol, HDL/blood , Cholesterol, LDL/blood , Chromosomes, Human, Pair 9/genetics , Gene Frequency , Genome, Human , Genomics , Hemoglobins/metabolism , Humans , Introns , Leukocyte Count , Lipoprotein(a)/blood , Magnesium/blood , Natriuretic Peptide, Brain/blood , Natriuretic Peptide, Brain/genetics , Neutrophils/cytology , Peptide Fragments/blood , Peptide Fragments/genetics , Phosphorus/blood , Platelet Count , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Troponin T/blood , White People/genetics
17.
Genome Res ; 26(12): 1651-1662, 2016 12.
Article in English | MEDLINE | ID: mdl-27934697

ABSTRACT

Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Macaca mulatta/genetics , Whole Genome Sequencing/methods , Animals , Evolution, Molecular , Female , Genetic Fitness , Macaca mulatta/classification , Models, Animal , Polymorphism, Single Nucleotide , Population Density
18.
BMC Bioinformatics ; 17(1): 361, 2016 Sep 10.
Article in English | MEDLINE | ID: mdl-27612449

ABSTRACT

BACKGROUND: The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. RESULTS: We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. CONCLUSIONS: Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.


Subject(s)
Genome, Human , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Databases, Genetic , Humans
19.
PLoS One ; 11(9): e0160757, 2016.
Article in English | MEDLINE | ID: mdl-27584569

ABSTRACT

VWF is extensively glycosylated with biantennary core fucosylated glycans. Most N-linked and O-linked glycans on VWF are sialylated. FVIII is also glycosylated, with a glycan structure similar to that of VWF. ST3GAL sialyltransferases catalyze the transfer of sialic acids in the α2,3 linkage to termini of N- and O-glycans. This sialic acid modification is critical for VWF synthesis and activity. We analyzed genetic and phenotypic data from the Atherosclerosis Risk in Communities (ARIC) study for the association of single nucleotide polymorphisms (SNPs) in the ST3GAL4 gene with plasma VWF levels and FVIII activity in 12,117 subjects. We also analyzed ST3GAL4 SNPs found in 2,535 subjects of 26 ethnicities from the 1000 Genomes (1000G) project for ethnic diversity, SNP imputation, and ST3GAL4 haplotypes. We identified 14 and 1,714 ST3GAL4 variants in the ARIC GWAS and 1000G databases respectively, with 46% being ethnically diverse in their allele frequencies. Among the 14 ST3GAL4 SNPs found in ARIC GWAS, the intronic rs2186717, rs7928391, and rs11220465 were associated with VWF levels and with FVIII activity after adjustment for age, BMI, hypertension, diabetes, ever-smoking status, and ABO. This study illustrates the power of next-generation sequencing in the discovery of new genetic variants and a significant ethnic diversity in the ST3GAL4 gene. We discuss potential mechanisms through which these intronic SNPs regulate ST3GAL4 biosynthesis and the activity that affects VWF and FVIII.


Subject(s)
Factor VIII/metabolism , Polymorphism, Single Nucleotide , Sialyltransferases/genetics , von Willebrand Factor/metabolism , Haplotypes , Humans , beta-Galactoside alpha-2,3-Sialyltransferase
20.
Hum Mutat ; 37(11): 1209-1214, 2016 11.
Article in English | MEDLINE | ID: mdl-27507420

ABSTRACT

Understanding the evolution of disease-associated mutations is fundamental to analyze pathogenetics of diseases. Mutation, recombination (by GC-biased gene conversion, gBGC), and selection have been known to shape the evolution of disease-associated mutations, but how these evolutionary forces work together is still an open question. In this study, we analyzed several human large-scale datasets (1000 Genomes, ESP6500, ExAC and ClinVar), and found that base-biased mutagenesis generates more GC→AT than AT→GC mutations, while gBGC promotes the fixation of AT→GC mutations to balance the impact of base-biased mutation on genome. Due to this effect of gBGC, purifying selection removes more deleterious AT→GC mutations than GC→AT from population, but many high-frequency (fixed and nearly fixed) deleterious AT→GC mutations are remained possibly due to high genetic load. As a special subset, disease-associated mutations follow this evolutionary rule, in which disease-associated GC→AT mutations are more enriched in rare mutations compared with AT→GC, while disease-associated AT→GC are more enriched in mutations with high frequency. Thus, we presented a base-biased evolutionary framework that explains the base-biased generation and accumulation of disease-associated mutations in human populations.


Subject(s)
Genetic Predisposition to Disease , Mutation , Base Composition , Databases, Genetic , Evolution, Molecular , Gene Conversion , Genome, Human , Humans , Models, Genetic , Recombination, Genetic , Selection, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...