RESUMEN
Residue-level potentials of mean force were widely used for protein backbone refinements to avoid simultaneous sampling of side-chain conformations. The interaction energy between the reduced side chains and backbone atoms was not considered explicitly. In this study, we developed novel methods to calculate the residue-atom interaction energy in combination with atomic and residue-level terms. The parameters were optimized step by step to remove the overcounting or overlap problem between different energy terms. The mixing energy functions were then used to evaluate the generated backbone conformations at the initial sampling stage of protein loop modeling (OSCAR-loop), including the interaction energy between the reduced loop residues and full atoms of the protein framework. The accuracies of top-ranked decoys were 1.18 and 2.81 Å for 8-residue and 12-residue loops, respectively. We then selected diverse decoys for side-chain modeling, backbone refinement, and energy minimization. The procedure was repeated multiple times to select one prediction with the lowest energy. Consequently, we obtained an accuracy of 0.74 Å for a prevailing test set of 12-residue loops, compared with >1.4 Å reported by other researchers. The OSCAR-loop was also effective for modeling the H3 loops of antibody complementary determining regions (CDRs) in the crystal environment. The prediction accuracy of OSCAR-loop (1.74 Å) was better than the accuracy of the Rosetta NGK method (3.11 Å) or those achieved by deep learning methods (>2.2 Å) for the CDRH3 loops of 49 targets in the Rosetta antibody benchmark. The performance of OSCAR-loop in a model environment was also discussed.
Asunto(s)
Anticuerpos , Proteínas , Conformación Proteica , Modelos Moleculares , Proteínas/química , Anticuerpos/química , AlgoritmosRESUMEN
Identifiability of statistical models is a fundamental regularity condition that is required for valid statistical inference. Investigation of model identifiability is mathematically challenging for complex models such as latent class models. Jones et al. used Goodman's technique to investigate the identifiability of latent class models with applications to diagnostic tests in the absence of a gold standard test. The tool they used was based on examining the singularity of the Jacobian or the Fisher information matrix, in order to obtain insights into local identifiability (ie, there exists a neighborhood of a parameter such that no other parameter in the neighborhood leads to the same probability distribution as the parameter). In this paper, we investigate a stronger condition: global identifiability (ie, no two parameters in the parameter space give rise to the same probability distribution), by introducing a powerful mathematical tool from computational algebra: the Gröbner basis. With several existing well-known examples, we argue that the Gröbner basis method is easy to implement and powerful to study global identifiability of latent class models, and is an attractive alternative to the information matrix analysis by Rothenberg and the Jacobian analysis by Goodman and Jones et al.
Asunto(s)
Biometría/métodos , Pruebas Diagnósticas de Rutina/estadística & datos numéricos , Análisis de Clases Latentes , Modelos Estadísticos , Algoritmos , Sesgo , Simulación por Computador , Pruebas Diagnósticas de Rutina/normas , Humanos , Reproducibilidad de los ResultadosRESUMEN
Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.
Asunto(s)
Confidencialidad , Dermatoglifia del ADN , Modelos Genéticos , Fenotipo , Secuenciación Completa del Genoma , Adulto , Factores de Edad , Algoritmos , Tamaño Corporal , Estudios de Cohortes , Anonimización de la Información , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pigmentación/genética , Adulto JovenRESUMEN
Although there are many methods available for inferring copy-number variants (CNVs) from next-generation sequence data, there remains a need for a system that is computationally efficient but that retains good sensitivity and specificity across all types of CNVs. Here, we introduce a new method, estimation by read depth with single-nucleotide variants (ERDS), and use various approaches to compare its performance to other methods. We found that for common CNVs and high-coverage genomes, ERDS performs as well as the best method currently available (Genome STRiP), whereas for rare CNVs and high-coverage genomes, ERDS performs better than any available method. Importantly, ERDS accommodates both unique and highly amplified regions of the genome and does so without requiring separate alignments for calling CNVs and other variants. These comparisons show that for genomes sequenced at high coverage, ERDS provides a computationally convenient method that calls CNVs as well as or better than any currently available method.
Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma Humano , Análisis de Secuencia de ADN/métodos , Algoritmos , Eliminación de Gen , Técnicas de Genotipaje , Humanos , Estudios de Validación como AsuntoRESUMEN
To date, the widely used genome-wide association studies (GWASs) of the human genome have reported thousands of variants that are significantly associated with various human traits. However, in the vast majority of these cases, the causal variants responsible for the observed associations remain unknown. In order to facilitate the identification of causal variants, we designed a simple computational method called the "preferential linkage disequilibrium (LD)" approach, which follows the variants discovered by GWASs to pinpoint the causal variants, even if they are rare compared with the discovery variants. The approach is based on the hypothesis that the GWAS-discovered variant is better at tagging the causal variants than are most other variants evaluated in the original GWAS. Applying the preferential LD approach to the GWAS signals of five human traits for which the causal variants are already known, we successfully placed the known causal variants among the top ten candidates in the majority of these cases. Application of this method to additional GWASs, including those of hepatitis C virus treatment response, plasma levels of clotting factors, and late-onset Alzheimer disease, has led to the identification of a number of promising candidate causal variants. This method represents a useful tool for delineating causal variants by bringing together GWAS signals and the rapidly accumulating variant data from next-generation sequencing.
Asunto(s)
Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento , Biología Computacional/métodos , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Genoma Humano , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
Copy number variation (CNV) has been found to play an important role in human disease. Next-generation sequencing technology, including whole-genome sequencing (WGS) and whole-exome sequencing (WES), has become a primary strategy for studying the genetic basis of human disease. Several CNV calling tools have recently been developed on the basis of WES data. However, the comparative performance of these tools using real data remains unclear. An objective evaluation study of these tools in practical research situations would be beneficial. Here, we evaluated four well-known WES-based CNV detection tools (XHMM, CoNIFER, ExomeDepth, and CONTRA) using real data generated in house. After evaluation using six metrics, we found that the sensitive and accurate detection of CNVs in WES data remains challenging despite the many algorithms available. Each algorithm has its own strengths and weaknesses. None of the exome-based CNV calling methods performed well in all situations; in particular, compared with CNVs identified from high coverage WGS data from the same samples, all tools suffered from limited power. Our evaluation provides a comprehensive and objective comparison of several well-known detection tools designed for WES data, which will assist researchers in choosing the most suitable tools for their research needs.
Asunto(s)
Variaciones en el Número de Copia de ADN , Exoma , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Genómica/métodos , Heterocigoto , Humanos , Polimorfismo de Nucleótido Simple , Sensibilidad y Especificidad , Eliminación de SecuenciaRESUMEN
One of the longest running debates in evolutionary biology concerns the kind of genetic variation that is primarily responsible for phenotypic variation in species. Here, we address this question for humans specifically from the perspective of population allele frequency of variants across the complete genome, including both coding and noncoding regions. We establish simple criteria to assess the likelihood that variants are functional based on their genomic locations and then use whole-genome sequence data from 29 subjects of European origin to assess the relationship between the functional properties of variants and their population allele frequencies. We find that for all criteria used to assess the likelihood that a variant is functional, the rarer variants are significantly more likely to be functional than the more common variants. Strikingly, these patterns disappear when we focus on only those variants in which the major alleles are derived. These analyses indicate that the majority of the genetic variation in terms of phenotypic consequence may result from a mutation-selection balance, as opposed to balancing selection, and have direct relevance to the study of human disease.
Asunto(s)
Variación Genética , Alelos , Secuencia Conservada , Evolución Molecular , Frecuencia de los Genes , Genes Reguladores , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Mutación , Fenotipo , Polimorfismo de Nucleótido Simple , Selección Genética , Población Blanca/genéticaRESUMEN
High-performance flexible strain sensors have tremendous potential applications in wearable devices and health monitoring. However, developing a flexible strain sensor with high sensitivity over a wide strain range remains a significant challenge. In this study, a fibrous membrane with a porous and crimped structure was designed as the substrate material for TPU/GNPs flexible strain sensors. This structural design effectively balances sensitivity with the strain range. The TPU-PEO fibrous membrane prepared using electrospinning with water washing, resulted in a porous fibrous membrane with a TPU framework. Subsequently, the fibrous membrane was subjected to anhydrous ethanol stimulation to obtain a porous and crimped network structure. GNPs were modified on the TPU fibrous membrane through ultrasonic treatment. The produced flexible strain sensor exhibited high sensitivity (GF = 4047.5) within a large strain range (350%) and demonstrated excellent sensing performance, stability, and durability (>10,000 cycles). It not only captured basic movements but also efficiently recognized and measured bending angles, enabling a more sophisticated human-machine interaction experience. This advancement opens up possibilities for future intelligent wearable technology and human-machine interaction, contributing to the evolution of these fields.
RESUMEN
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.
Asunto(s)
Genoma Humano/genética , Análisis de Secuencia de ADN , Secuencia de Bases , Estudios de Casos y Controles , Variaciones en el Número de Copia de ADN/genética , Bases de Datos Genéticas , Exones/genética , Factor VIII/genética , Duplicación de Gen/genética , Técnicas de Inactivación de Genes , Genética de Población , Genotipo , Hemofilia A/genética , Humanos , Mutación INDEL/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Sistemas de Lectura Abierta/genética , Polimorfismo Genético , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
BACKGROUND & AIMS: Interferon-alfa (IFN)-related cytopenias are common and may be dose-limiting. We performed a genome wide association study on a well-characterized genotype 1 HCV cohort to identify genetic determinants of peginterferon-α (pegIFN)-related thrombocytopenia, neutropenia, and leukopenia. METHODS: 1604/3070 patients in the IDEAL study consented to genetic testing. Trial inclusion criteria included a platelet (Pl) count ≥80×10(9)/L and an absolute neutrophil count (ANC) ≥1500/mm(3). Samples were genotyped using the Illumina Human610-quad BeadChip. The primary analyses focused on the genetic determinants of quantitative change in cell counts (Pl, ANC, lymphocytes, monocytes, eosinophils, and basophils) at week 4 in patients >80% adherent to therapy (n=1294). RESULTS: 6 SNPs on chromosome 20 were positively associated with Pl reduction (top SNP rs965469, p=10(-10)). These tag SNPs are in high linkage disequilibrium with 2 functional variants in the ITPA gene, rs1127354 and rs7270101, that cause ITPase deficiency and protect against ribavirin (RBV)-induced hemolytic anemia (HA). rs1127354 and rs7270101 showed strong independent associations with Pl reduction (p=10(-12), p=10(-7)) and entirely explained the genome-wide significant associations. We believe this is an example of an indirect genetic association due to a reactive thrombocytosis to RBV-induced anemia: Hb decline was inversely correlated with Pl reduction (r=-0.28, p=10(-17)) and Hb change largely attenuated the association between the ITPA variants and Pl reduction in regression models. No common genetic variants were associated with pegIFN-induced neutropenia or leucopenia. CONCLUSIONS: Two ITPA variants were associated with thrombocytopenia; this was largely explained by a thrombocytotic response to RBV-induced HA attenuating IFN-related thrombocytopenia. No genetic determinants of pegIFN-induced neutropenia were identified.
Asunto(s)
Hepatitis C Crónica/tratamiento farmacológico , Hepatitis C Crónica/genética , Interferón-alfa/efectos adversos , Leucopenia/inducido químicamente , Leucopenia/genética , Neutropenia/inducido químicamente , Neutropenia/genética , Polietilenglicoles/efectos adversos , Adulto , Antivirales/efectos adversos , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Interferón alfa-2 , Desequilibrio de Ligamiento , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Pirofosfatasas/genética , Proteínas Recombinantes/efectos adversos , Ribavirina/efectos adversos , Trombocitopenia/inducido químicamente , Trombocitopenia/genéticaRESUMEN
SUMMARY: Here we present Sequence Variant Analyzer (SVA), a software tool that assigns a predicted biological function to variants identified in next-generation sequencing studies and provides a browser to visualize the variants in their genomic contexts. SVA also provides for flexible interaction with software implementing variant association tests allowing users to consider both the bioinformatic annotation of identified variants and the strength of their associations with studied traits. We illustrate the annotation features of SVA using two simple examples of sequenced genomes that harbor Mendelian mutations. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://www.svaproject.org.
Asunto(s)
Genoma Humano , Programas Informáticos , Recursos Audiovisuales , Secuencia de Bases , Variación Estructural del Genoma , Humanos , Internet , Análisis de Secuencia de ADN/métodosRESUMEN
BACKGROUND: Single-nucleotide polymorphisms (SNPs) in the IL28B and PNPLA3 gene regions have been associated with hepatic steatosis in genotype 1 (G1) chronic HCV infection but their clinical impacts remain to be determined. AIM: We sought to validate these associations and to explore their impact on treatment response to peginterferon and ribavirin therapy. METHODS: A total of 972 G1 HCV-infected Caucasian patients were genotyped for the SNPs rs12979860 (IL28B) and rs2896019 (PNPLA3). Multivariable analysis tested IL28B and PNPLA3 for association with the presence of any steatosis (>0 %); clinically significant steatosis (>5 %); steatosis severity (grade 0-3/4); and the interacting associations of the SNPs and hepatic steatosis to sustained viral response (SVR). RESULTS: IL28B and PNPLA3 polymorphisms were associated with the presence of any steatosis (rs12979860, p = 1.87 × 10(-7); rs2896019, p = 7.56 × 10(-4)); clinically significant steatosis (rs12979860, p = 1.82 × 10(-3); rs2896019, p = 1.27 × 10(-4)); and steatosis severity (rs12979860, p = 2.05 × 10(-8); rs2896019, p = 2.62 × 10(-6)). Obesity, hypertriglyceridemia, hyperglycemia, liver fibrosis, and liver inflammation were all independently associated with worse steatosis. Hepatic steatosis was associated with lower SVR, and this effect was attenuated by IL28B. PNPLA3 had no independent association with SVR. CONCLUSIONS: IL28B and PNPLA3 are associated with hepatic steatosis prevalence and severity in Caucasians with G1 HCV, suggesting differing potential genetic risk pathways to steatosis. IL28B attenuates the association between steatosis and SVR. Remediable metabolic risk factors remain important, independently of these polymorphisms, and remain key therapeutic goals to achieve better outcomes for patients with HCV-associated hepatic steatosis.
Asunto(s)
Hígado Graso/genética , Hepatitis C Crónica/complicaciones , Interleucinas/genética , Lipasa/genética , Proteínas de la Membrana/genética , Adulto , Anciano , Hígado Graso/virología , Femenino , Genotipo , Hepatitis C Crónica/genética , Hepatitis C Crónica/virología , Humanos , Interferones , Masculino , Persona de Mediana Edad , Prevalencia , Estudios Prospectivos , Análisis de Regresión , Factores de RiesgoAsunto(s)
Genoma Humano , Síndromes de Inmunodeficiencia/genética , Mutación , NADPH Oxidasas/genética , Proteínas Nucleares/genética , Empalme Alternativo , Linfocitos B/inmunología , Linfocitos B/patología , Secuencia de Bases , Niño , Proteínas de Unión al ADN , Endonucleasas , Exones , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunoglobulina A/genética , Inmunoglobulina G/genética , Síndromes de Inmunodeficiencia/diagnóstico , Síndromes de Inmunodeficiencia/inmunología , Síndromes de Inmunodeficiencia/patología , Células Asesinas Naturales/inmunología , Células Asesinas Naturales/patología , Recuento de Linfocitos , Masculino , Datos de Secuencia Molecular , NADPH Oxidasas/inmunología , Proteínas Nucleares/inmunología , Linfocitos T/inmunología , Linfocitos T/patologíaRESUMEN
This paper deals with the global exponential stability for delayed recurrent neural networks (DRNNs). By constructing an augmented Lyapunov-Krasovskii functional and adopting the reciprocally convex combination approach and Wirtinger-based integral inequality, delay-dependent global exponential stability criteria are derived in terms of linear matrix inequalities. Meanwhile, a general and effective method on global exponential stability analysis for DRNNs is given through a lemma, where the exponential convergence rate can be estimated. With this lemma, some global asymptotic stability criteria of DRNNs acquired in previous studies can be generalized to global exponential stability ones. Finally, a frequently utilized numerical example is carried out to illustrate the effectiveness and merits of the proposed theoretical results.
RESUMEN
OBJECTIVE: The Genetic Absence Epilepsy Rats from Strasbourg (GAERS) are an inbreed Wistar rat strain widely used as a model of genetic generalised epilepsy with absence seizures. As in humans, the genetic architecture that results in genetic generalized epilepsy in GAERS is poorly understood. Here we present the strain-specific variants found among the epileptic GAERS and their related Non-Epileptic Control (NEC) strain. The GAERS and NEC represent a powerful opportunity to identify neurobiological factors that are associated with the genetic generalised epilepsy phenotype. METHODS: We performed whole genome sequencing on adult epileptic GAERS and adult NEC rats, a strain derived from the same original Wistar colony. We also generated whole genome sequencing on four double-crossed (GAERS with NEC) F2 selected for high-seizing (n = 2) and non-seizing (n = 2) phenotypes. RESULTS: Specific to the GAERS genome, we identified 1.12 million single nucleotide variants, 296.5K short insertion-deletions, and 354 putative copy number variants that result in complete or partial loss/duplication of 41 genes. Of the GAERS-specific variants that met high quality criteria, 25 are annotated as stop codon gain/loss, 56 as putative essential splice sites, and 56 indels are predicted to result in a frameshift. Subsequent screening against the two F2 progeny sequenced for having the highest and two F2 progeny for having the lowest seizure burden identified only the selected Cacna1h GAERS-private protein-coding variant as exclusively co-segregating with the two high-seizing F2 rats. SIGNIFICANCE: This study highlights an approach for using whole genome sequencing to narrow down to a manageable candidate list of genetic variants in a complex genetic epilepsy animal model, and suggests utility of this sequencing design to investigate other spontaneously occurring animal models of human disease.