Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 54
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Nature ; 536(7616): 285-91, 2016 08 18.
Article in English | MEDLINE | ID: mdl-27535533

ABSTRACT

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.


Subject(s)
Exome/genetics , Genetic Variation/genetics , DNA Mutational Analysis , Datasets as Topic , Humans , Phenotype , Proteome/genetics , Rare Diseases/genetics , Sample Size
2.
Nature ; 506(7487): 179-84, 2014 Feb 13.
Article in English | MEDLINE | ID: mdl-24463507

ABSTRACT

Inherited alleles account for most of the genetic risk for schizophrenia. However, new (de novo) mutations, in the form of large chromosomal copy number changes, occur in a small fraction of cases and disproportionally disrupt genes encoding postsynaptic proteins. Here we show that small de novo mutations, affecting one or a few nucleotides, are overrepresented among glutamatergic postsynaptic proteins comprising activity-regulated cytoskeleton-associated protein (ARC) and N-methyl-d-aspartate receptor (NMDAR) complexes. Mutations are additionally enriched in proteins that interact with these complexes to modulate synaptic strength, namely proteins regulating actin filament dynamics and those whose messenger RNAs are targets of fragile X mental retardation protein (FMRP). Genes affected by mutations in schizophrenia overlap those mutated in autism and intellectual disability, as do mutation-enriched synaptic pathways. Aligning our findings with a parallel case-control study, we demonstrate reproducible insights into aetiological mechanisms for schizophrenia and reveal pathophysiology shared with other neurodevelopmental disorders.


Subject(s)
Models, Neurological , Mutation/genetics , Nerve Net/metabolism , Neural Pathways/metabolism , Schizophrenia/genetics , Schizophrenia/physiopathology , Synapses/metabolism , Child Development Disorders, Pervasive/genetics , Cytoskeletal Proteins/metabolism , Exome/genetics , Fragile X Mental Retardation Protein/metabolism , Humans , Intellectual Disability/genetics , Mutation Rate , Nerve Net/physiopathology , Nerve Tissue Proteins/metabolism , Neural Pathways/physiopathology , Phenotype , RNA, Messenger/genetics , RNA, Messenger/metabolism , Receptors, N-Methyl-D-Aspartate/metabolism , Schizophrenia/metabolism , Substrate Specificity
3.
Nature ; 506(7487): 185-90, 2014 Feb 13.
Article in English | MEDLINE | ID: mdl-24463508

ABSTRACT

Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.


Subject(s)
Multifactorial Inheritance/genetics , Mutation/genetics , Schizophrenia/genetics , Autistic Disorder/genetics , Calcium Channels/genetics , Cytoskeletal Proteins/genetics , DNA Copy Number Variations/genetics , Disks Large Homolog 4 Protein , Female , Fragile X Mental Retardation Protein/metabolism , Genome-Wide Association Study , Humans , Intellectual Disability/genetics , Intracellular Signaling Peptides and Proteins/genetics , Male , Membrane Proteins/genetics , Nerve Tissue Proteins/genetics , Receptors, N-Methyl-D-Aspartate/genetics
4.
Nature ; 515(7526): 209-15, 2014 Nov 13.
Article in English | MEDLINE | ID: mdl-25363760

ABSTRACT

The genetic architecture of autism spectrum disorder involves the interplay of common and rare variants and their impact on hundreds of genes. Using exome sequencing, here we show that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, plus a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic formation, transcriptional regulation and chromatin-remodelling pathways. These include voltage-gated ion channels regulating the propagation of action potentials, pacemaking and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodellers-most prominently those that mediate post-translational lysine methylation/demethylation modifications of histones.


Subject(s)
Child Development Disorders, Pervasive/genetics , Chromatin/genetics , Genetic Predisposition to Disease/genetics , Mutation/genetics , Synapses/metabolism , Transcription, Genetic/genetics , Amino Acid Sequence , Child Development Disorders, Pervasive/pathology , Chromatin/metabolism , Chromatin Assembly and Disassembly , Exome/genetics , Female , Germ-Line Mutation/genetics , Humans , Male , Molecular Sequence Data , Mutation, Missense/genetics , Nerve Net/metabolism , Odds Ratio
5.
Nature ; 485(7397): 242-5, 2012 Apr 04.
Article in English | MEDLINE | ID: mdl-22495311

ABSTRACT

Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.


Subject(s)
Autistic Disorder/genetics , DNA-Binding Proteins/genetics , Exons/genetics , Genetic Predisposition to Disease/genetics , Mutation/genetics , Transcription Factors/genetics , Case-Control Studies , Exome/genetics , Family Health , Humans , Models, Genetic , Multifactorial Inheritance/genetics , Phenotype , Poisson Distribution , Protein Interaction Maps
6.
N Engl J Med ; 371(26): 2477-87, 2014 Dec 25.
Article in English | MEDLINE | ID: mdl-25426838

ABSTRACT

BACKGROUND: Cancers arise from multiple acquired mutations, which presumably occur over many years. Early stages in cancer development might be present years before cancers become clinically apparent. METHODS: We analyzed data from whole-exome sequencing of DNA in peripheral-blood cells from 12,380 persons, unselected for cancer or hematologic phenotypes. We identified somatic mutations on the basis of unusual allelic fractions. We used data from Swedish national patient registers to follow health outcomes for 2 to 7 years after DNA sampling. RESULTS: Clonal hematopoiesis with somatic mutations was observed in 10% of persons older than 65 years of age but in only 1% of those younger than 50 years of age. Detectable clonal expansions most frequently involved somatic mutations in three genes (DNMT3A, ASXL1, and TET2) that have previously been implicated in hematologic cancers. Clonal hematopoiesis was a strong risk factor for subsequent hematologic cancer (hazard ratio, 12.9; 95% confidence interval, 5.8 to 28.7). Approximately 42% of hematologic cancers in this cohort arose in persons who had clonality at the time of DNA sampling, more than 6 months before a first diagnosis of cancer. Analysis of bone marrow-biopsy specimens obtained from two patients at the time of diagnosis of acute myeloid leukemia revealed that their cancers arose from the earlier clones. CONCLUSIONS: Clonal hematopoiesis with somatic mutations is readily detected by means of DNA sequencing, is increasingly common as people age, and is associated with increased risks of hematologic cancer and death. A subset of the genes that are mutated in patients with myeloid cancers is frequently mutated in apparently healthy persons; these mutations may represent characteristic early events in the development of hematologic cancers. (Funded by the National Human Genome Research Institute and others.).


Subject(s)
Blood , Cell Transformation, Neoplastic/genetics , Hematologic Neoplasms/genetics , Hematopoiesis/physiology , Hematopoietic Stem Cells/physiology , Mutation , Adult , Age Factors , Aged , Aged, 80 and over , Clone Cells , DNA Mutational Analysis , Exome , Hematologic Neoplasms/physiopathology , Humans , Middle Aged , Risk Factors , Young Adult
7.
Am J Hum Genet ; 93(4): 607-19, 2013 Oct 03.
Article in English | MEDLINE | ID: mdl-24094742

ABSTRACT

Copy number variation (CNV) is an important determinant of human diversity and plays important roles in susceptibility to disease. Most studies of CNV carried out to date have made use of chromosome microarray and have had a lower size limit for detection of about 30 kilobases (kb). With the emergence of whole-exome sequencing studies, we asked whether such data could be used to reliably call rare exonic CNV in the size range of 1-30 kilobases (kb), making use of the eXome Hidden Markov Model (XHMM) program. By using both transmission information and validation by molecular methods, we confirmed that small CNV encompassing as few as three exons can be reliably called from whole-exome data. We applied this approach to an autism case-control sample (n = 811, mean per-target read depth = 161) and observed a significant increase in the burden of rare (MAF ≤1%) 1-30 kb CNV, 1-30 kb deletions, and 1-10 kb deletions in ASD. CNV in the 1-30 kb range frequently hit just a single gene, and we were therefore able to carry out enrichment and pathway analyses, where we observed enrichment for disruption of genes in cytoskeletal and autophagy pathways in ASD. In summary, our results showed that XHMM provided an effective means to assess small exonic CNV from whole-exome data, indicated that rare 1-30 kb exonic deletions could contribute to risk in up to 7% of individuals with ASD, and implicated a candidate pathway in developmental delay syndromes.


Subject(s)
Child Development Disorders, Pervasive/genetics , DNA Copy Number Variations , Exome , Autophagy/genetics , Base Sequence , Case-Control Studies , Child , Exons , Gene Deletion , Genetic Predisposition to Disease , Humans , Molecular Sequence Data , Sequence Analysis, DNA/methods
8.
Circ Res ; 115(10): 884-896, 2014 Oct 24.
Article in English | MEDLINE | ID: mdl-25205790

ABSTRACT

RATIONALE: Congenital heart disease (CHD) is among the most common birth defects. Most cases are of unknown pathogenesis. OBJECTIVE: To determine the contribution of de novo copy number variants (CNVs) in the pathogenesis of sporadic CHD. METHODS AND RESULTS: We studied 538 CHD trios using genome-wide dense single nucleotide polymorphism arrays and whole exome sequencing. Results were experimentally validated using digital droplet polymerase chain reaction. We compared validated CNVs in CHD cases with CNVs in 1301 healthy control trios. The 2 complementary high-resolution technologies identified 63 validated de novo CNVs in 51 CHD cases. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either single nucleotide polymorphism array (P=7×10(-5); odds ratio, 4.6) or whole exome sequencing data (P=6×10(-4); odds ratio, 3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (P=0.02; odds ratio, 2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in whole exome sequencing and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q subtelomeric deletions. CONCLUSIONS: We demonstrate a significantly increased frequency of rare de novo CNVs in CHD patients compared with healthy controls and suggest several novel genetic loci for CHD.


Subject(s)
DNA Copy Number Variations/genetics , Exome/genetics , Gene Frequency/genetics , Heart Defects, Congenital/genetics , Polymorphism, Single Nucleotide/genetics , Case-Control Studies , Cohort Studies , Gene Regulatory Networks/genetics , Heart Defects, Congenital/diagnosis , Humans , Molecular Sequence Data
9.
Int J Cancer ; 137(4): 776-83, 2015 Aug 15.
Article in English | MEDLINE | ID: mdl-25626421

ABSTRACT

Although the rates of cervical squamous cell carcinoma have been declining, the rates of cervical adenocarcinoma are increasing in some countries. Outcomes for advanced cervical adenocarcinoma remain poor. Precision mapping of genetic alterations in cervical adenocarcinoma may enable better selection of therapies and deliver improved outcomes when combined with new sequencing diagnostics. We present whole-exome sequencing results from 15 cervical adenocarcinomas and paired normal samples from Hong Kong Chinese women. These data revealed a heterogeneous mutation spectrum and identified several frequently altered genes including FAT1, ARID1A, ERBB2 and PIK3CA. Exome sequencing identified human papillomavirus (HPV) sequences in 13 tumors in which the HPV genome might have integrated into and hence disrupted the functions of certain exons, raising the possibility that HPV integration can alter pathways other than p53 and pRb. Together, these provisionary data suggest the potential for individualized therapies for cervical adenocarcinoma based on genomic information.


Subject(s)
Adenocarcinoma/genetics , High-Throughput Nucleotide Sequencing , Uterine Cervical Neoplasms/genetics , Adenocarcinoma/pathology , Adenocarcinoma/virology , Adult , Aged , Exome , Female , Hong Kong , Humans , Middle Aged , Mutation , Neoplasm Staging , Papillomaviridae/genetics , Papillomaviridae/pathogenicity , Uterine Cervical Neoplasms/pathology , Uterine Cervical Neoplasms/virology
10.
Am J Hum Genet ; 91(4): 597-607, 2012 Oct 05.
Article in English | MEDLINE | ID: mdl-23040492

ABSTRACT

Sequencing of gene-coding regions (the exome) is increasingly used for studying human disease, for which copy-number variants (CNVs) are a critical genetic component. However, detecting copy number from exome sequencing is challenging because of the noncontiguous nature of the captured exons. This is compounded by the complex relationship between read depth and copy number; this results from biases in targeted genomic hybridization, sequence factors such as GC content, and batching of samples during collection and sequencing. We present a statistical tool (exome hidden Markov model [XHMM]) that uses principal-component analysis (PCA) to normalize exome read depth and a hidden Markov model (HMM) to discover exon-resolution CNV and genotype variation across samples. We evaluate performance on 90 schizophrenia trios and 1,017 case-control samples. XHMM detects a median of two rare (<1%) CNVs per individual (one deletion and one duplication) and has 79% sensitivity to similarly rare CNVs overlapping three or more exons discovered with microarrays. With sensitivity similar to state-of-the-art methods, XHMM achieves higher specificity by assigning quality metrics to the CNV calls to filter out bad ones, as well as to statistically genotype the discovered CNV in all individuals, yielding a trio call set with Mendelian-inheritance properties highly consistent with expectation. We also show that XHMM breakpoint quality scores enable researchers to explicitly search for novel classes of structural variation. For example, we apply XHMM to extract those CNVs that are highly likely to disrupt (delete or duplicate) only a portion of a gene.


Subject(s)
DNA Copy Number Variations , Exome , Exons , Genome-Wide Association Study/methods , High-Throughput Nucleotide Sequencing/methods , Case-Control Studies , Genotype , Genotyping Techniques/methods , Humans , Models, Genetic , Nucleic Acid Hybridization/methods , Oligonucleotide Array Sequence Analysis/methods
11.
Bioinformatics ; 28(19): 2543-5, 2012 Oct 01.
Article in English | MEDLINE | ID: mdl-22843986

ABSTRACT

SUMMARY: zCall is a variant caller specifically designed for calling rare single-nucleotide polymorphisms from array-based technology. This caller is implemented as a post-processing step after a default calling algorithm has been applied. The algorithm uses the intensity profile of the common allele homozygote cluster to define the location of the other two genotype clusters. We demonstrate improved detection of rare alleles when applying zCall to samples that have both Illumina Infinium HumanExome BeadChip and exome sequencing data available. AVAILABILITY: http://atguweb.mgh.harvard.edu/apps/zcall. CONTACT: bneale@broadinstitute.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Genotyping Techniques , Polymorphism, Single Nucleotide , Software , Alleles , Cluster Analysis , Exome , Homozygote , Humans
12.
Bioinformatics ; 27(5): 655-61, 2011 Mar 01.
Article in English | MEDLINE | ID: mdl-21258061

ABSTRACT

MOTIVATION: Large-scale RNA expression measurements are generating enormous quantities of data. During the last two decades, many methods were developed for extracting insights regarding the interrelationships between genes from such data. The mathematical and computational perspectives that underlie these methods are usually algebraic or probabilistic. RESULTS: Here, we introduce an unexplored geometric view point where expression levels of genes in multiple experiments are interpreted as vectors in a high-dimensional space. Specifically, we find, for the expression profile of each particular gene, its approximation as a linear combination of profiles of a few other genes. This method is inspired by recent developments in the realm of compressed sensing in the machine learning domain. To demonstrate the power of our approach in extracting valuable information from the expression data, we independently applied it to large-scale experiments carried out on the yeast and malaria parasite whole transcriptomes. The parameters extracted from the sparse reconstruction of the expression profiles, when fed to a supervised learning platform, were used to successfully predict the relationships between genes throughout the Gene Ontology hierarchy and protein-protein interaction map. Extensive assessment of the biological results shows high accuracy in both recovering known predictions and in yielding accurate predictions missing from the current databases. We suggest that the geometrical approach presented here is suitable for a broad range of high-dimensional experimental data.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Artificial Intelligence , Plasmodium falciparum/genetics , RNA, Fungal/genetics , RNA, Protozoan/genetics , Saccharomyces cerevisiae/genetics
13.
Nucleic Acids Res ; 38(Web Server issue): W84-9, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20444873

ABSTRACT

Derivation of biological meaning from large sets of proteins or genes is a frequent task in genomic and proteomic studies. Such sets often arise from experimental methods including large-scale gene expression experiments and mass spectrometry (MS) proteomics. Large sets of genes or proteins are also the outcome of computational methods such as BLAST search and homology-based classifications. We have developed the PANDORA web server, which functions as a platform for the advanced biological analysis of sets of genes, proteins, or proteolytic peptides. First, the input set is mapped to a set of corresponding proteins. Then, an analysis of the protein set produces a graph-based hierarchy which highlights intrinsic relations amongst biological subsets, in light of their different annotations from multiple annotation resources. PANDORA integrates a large collection of annotation sources (GO, UniProt Keywords, InterPro, Enzyme, SCOP, CATH, Gene-3D, NCBI taxonomy and more) that comprise approximately 200,000 different annotation terms associated with approximately 3.2 million sequences from UniProtKB. Statistical enrichment based on a binomial approximation of the hypergeometric distribution and corrected for multiple hypothesis tests is calculated using several background sets, including major gene-expression DNA-chip platforms. Users can also visualize either standard or user-defined binary and quantitative properties alongside the proteins. PANDORA 4.2 is available at http://www.pandora.cs.huji.ac.il.


Subject(s)
Peptides/chemistry , Peptides/metabolism , Proteins/chemistry , Proteins/metabolism , Software , Animals , Data Interpretation, Statistical , Databases, Protein , Humans , Internet , Mass Spectrometry , Mice , Peptides/physiology , Proteins/physiology , Proteomics , Rats , Systems Integration , User-Computer Interface
14.
Bioinformatics ; 26(18): 2266-72, 2010 Sep 15.
Article in English | MEDLINE | ID: mdl-20679332

ABSTRACT

MOTIVATION: In nature, protein-protein interactions are constantly evolving under various selective pressures. Nonetheless, it is expected that crucial interactions are maintained through compensatory mutations between interacting proteins. Thus, many studies have used evolutionary sequence data to extract such occurrences of correlated mutation. However, this research is confounded by other evolutionary pressures that contribute to sequence covariance, such as common ancestry. RESULTS: Here, we focus exclusively on the compensatory mutations deriving from physical protein interactions, by performing large-scale computational mutagenesis experiments for >260 protein-protein interfaces. We investigate the potential for co-adaptability present in protein pairs that are always found together in nature (obligate) and those that are occasionally in complex (transient). By modeling each complex both in bound and unbound forms, we find that naturally transient complexes possess greater relative capacity for correlated mutation than obligate complexes, even when differences in interface size are taken into account.


Subject(s)
Mutation , Proteins/chemistry , Adaptation, Biological , Base Sequence , Computational Biology , Evolution, Molecular , Protein Binding/genetics , Proteins/genetics
15.
Bioinformatics ; 26(19): 2466-7, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20685957

ABSTRACT

UNLABELLED: SPRINT is a software package that performs computational multistate protein design using state-of-the-art inference on probabilistic graphical models. The input to SPRINT is a list of protein structures, the rotamers modeled for each structure and the pre-calculated rotamer energies. Probabilistic inference is performed using the belief propagation or A* algorithms, and dead-end elimination can be applied as pre-processing. The output can either be a list of amino acid sequences simultaneously compatible with these structures, or probabilistic amino acid profiles compatible with the structures. In addition, higher order (e.g. pairwise) amino acid probabilities can also be predicted. Finally, SPRINT also has a module for protein side-chain prediction and single-state design. AVAILABILITY: The full C++ source code for SPRINT can be freely downloaded from http://www.protonet.cs.huji.ac.il/sprint.


Subject(s)
Algorithms , Proteins/chemistry , Software , Amino Acid Sequence , Amino Acids/chemistry , Models, Statistical , Protein Conformation
16.
PLoS One ; 16(8): e0254798, 2021.
Article in English | MEDLINE | ID: mdl-34383766

ABSTRACT

As society has moved past the initial phase of the COVID-19 crisis that relied on broad-spectrum shutdowns as a stopgap method, industries and institutions have faced the daunting question of how to return to a stabilized state of activities and more fully reopen the economy. A core problem is how to return people to their workplaces and educational institutions in a manner that is safe, ethical, grounded in science, and takes into account the unique factors and needs of each organization and community. In this paper, we introduce an epidemiological model (the "Community-Workplace" model) that accounts for SARS-CoV-2 transmission within the workplace, within the surrounding community, and between them. We use this multi-group deterministic compartmental model to consider various testing strategies that, together with symptom screening, exposure tracking, and nonpharmaceutical interventions (NPI) such as mask wearing and physical distancing, aim to reduce disease spread in the workplace. Our framework is designed to be adaptable to a variety of specific workplace environments to support planning efforts as reopenings continue. Using this model, we consider a number of case studies, including an office workplace, a factory floor, and a university campus. Analysis of these cases illustrates that continuous testing can help a workplace avoid an outbreak by reducing undetected infectiousness even in high-contact environments. We find that a university setting, where individuals spend more time on campus and have a higher contact load, requires more testing to remain safe, compared to a factory or office setting. Under the modeling assumptions, we find that maintaining a prevalence below 3% can be achieved in an office setting by testing its workforce every two weeks, whereas achieving this same goal for a university could require as much as fourfold more testing (i.e., testing the entire campus population twice a week). Our model also simulates the dynamics of reduced spread that result from the introduction of mitigation measures when test results reveal the early stages of a workplace outbreak. We use this to show that a vigilant university that has the ability to quickly react to outbreaks can be justified in implementing testing at the same rate as a lower-risk office workplace. Finally, we quantify the devastating impact that an outbreak in a small-town college could have on the surrounding community, which supports the notion that communities can be better protected by supporting their local places of business in preventing onsite spread of disease.


Subject(s)
COVID-19/prevention & control , Contact Tracing/methods , Disease Outbreaks/prevention & control , Physical Distancing , Universities , Workplace , Humans
17.
JMIR Ment Health ; 8(8): e27589, 2021 Aug 10.
Article in English | MEDLINE | ID: mdl-34383685

ABSTRACT

BACKGROUND: Although effective mental health treatments exist, the ability to match individuals to optimal treatments is poor, and timely assessment of response is difficult. One reason for these challenges is the lack of objective measurement of psychiatric symptoms. Sensors and active tasks recorded by smartphones provide a low-burden, low-cost, and scalable way to capture real-world data from patients that could augment clinical decision-making and move the field of mental health closer to measurement-based care. OBJECTIVE: This study tests the feasibility of a fully remote study on individuals with self-reported depression using an Android-based smartphone app to collect subjective and objective measures associated with depression severity. The goals of this pilot study are to develop an engaging user interface for high task adherence through user-centered design; test the quality of collected data from passive sensors; start building clinically relevant behavioral measures (features) from passive sensors and active inputs; and preliminarily explore connections between these features and depression severity. METHODS: A total of 600 participants were asked to download the study app to join this fully remote, observational 12-week study. The app passively collected 20 sensor data streams (eg, ambient audio level, location, and inertial measurement units), and participants were asked to complete daily survey tasks, weekly voice diaries, and the clinically validated Patient Health Questionnaire (PHQ-9) self-survey. Pairwise correlations between derived behavioral features (eg, weekly minutes spent at home) and PHQ-9 were computed. Using these behavioral features, we also constructed an elastic net penalized multivariate logistic regression model predicting depressed versus nondepressed PHQ-9 scores (ie, dichotomized PHQ-9). RESULTS: A total of 415 individuals logged into the app. Over the course of the 12-week study, these participants completed 83.35% (4151/4980) of the PHQ-9s. Applying data sufficiency rules for minimally necessary daily and weekly data resulted in 3779 participant-weeks of data across 384 participants. Using a subset of 34 behavioral features, we found that 11 features showed a significant (P<.001 Benjamini-Hochberg adjusted) Spearman correlation with weekly PHQ-9, including voice diary-derived word sentiment and ambient audio levels. Restricting the data to those cases in which all 34 behavioral features were present, we had available 1013 participant-weeks from 186 participants. The logistic regression model predicting depression status resulted in a 10-fold cross-validated mean area under the curve of 0.656 (SD 0.079). CONCLUSIONS: This study finds a strong proof of concept for the use of a smartphone-based assessment of depression outcomes. Behavioral features derived from passive sensors and active tasks show promising correlations with a validated clinical measure of depression (PHQ-9). Future work is needed to increase scale that may permit the construction of more complex (eg, nonlinear) predictive models and better handle data missingness.

18.
Proteins ; 78(3): 530-47, 2010 Feb 15.
Article in English | MEDLINE | ID: mdl-19842166

ABSTRACT

In nature, proteins partake in numerous protein- protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design.


Subject(s)
Amino Acid Sequence , Computational Biology/methods , Models, Chemical , Models, Statistical , Proteins/chemistry , Algorithms , Evolution, Molecular , Models, Biological , Models, Molecular , Molecular Sequence Data , Peroxisome Proliferator-Activated Receptors/chemistry , Peroxisome Proliferator-Activated Receptors/genetics , Proteins/genetics , Structure-Activity Relationship , Temperature , Thioredoxins/chemistry , Thioredoxins/genetics , Transducin/chemistry , Transducin/genetics
19.
Mol Syst Biol ; 5: 311, 2009.
Article in English | MEDLINE | ID: mdl-19888206

ABSTRACT

Viruses differ markedly in their specificity toward host organisms. Here, we test the level of general sequence adaptation that viruses display toward their hosts. We compiled a representative data set of viruses that infect hosts ranging from bacteria to humans. We consider their respective amino acid and codon usages and compare them among the viruses and their hosts. We show that bacteria-infecting viruses are strongly adapted to their specific hosts, but that they differ from other unrelated bacterial hosts. Viruses that infect humans, but not those that infect other mammals or aves, show a strong resemblance to most mammalian and avian hosts, in terms of both amino acid and codon preferences. In groups of viruses that infect humans or other mammals, the highest observed level of adaptation of viral proteins to host codon usages is for those proteins that appear abundantly in the virion. In contrast, proteins that are known to participate in host-specific recognition do not necessarily adapt to their respective hosts. The implication for the potential of viral infectivity is discussed.


Subject(s)
Adaptation, Physiological , Amino Acids/metabolism , Codon/genetics , Host-Pathogen Interactions/physiology , Proteome/metabolism , Viral Proteins/metabolism , Virus Physiological Phenomena , Amino Acids/genetics , Animals , Base Composition/genetics , Bias , Humans , Proteome/genetics , Viral Proteins/genetics , Viral Structural Proteins/genetics , Viral Structural Proteins/metabolism
20.
PLoS Comput Biol ; 5(12): e1000627, 2009 Dec.
Article in English | MEDLINE | ID: mdl-20041208

ABSTRACT

Natural proteins often partake in several highly specific protein-protein interactions. They are thus subject to multiple opposing forces during evolutionary selection. To be functional, such multispecific proteins need to be stable in complex with each interaction partner, and, at the same time, to maintain affinity toward all partners. How is this multispecificity acquired through natural evolution? To answer this compelling question, we study a prototypical multispecific protein, calmodulin (CaM), which has evolved to interact with hundreds of target proteins. Starting from high-resolution structures of sixteen CaM-target complexes, we employ state-of-the-art computational methods to predict a hundred CaM sequences best suited for interaction with each individual CaM target. Then, we design CaM sequences most compatible with each possible combination of two, three, and all sixteen targets simultaneously, producing almost 70,000 low energy CaM sequences. By comparing these sequences and their energies, we gain insight into how nature has managed to find the compromise between the need for favorable interaction energies and the need for multispecificity. We observe that designing for more partners simultaneously yields CaM sequences that better match natural sequence profiles, thus emphasizing the importance of such strategies in nature. Furthermore, we show that the CaM binding interface can be nicely partitioned into positions that are critical for the affinity of all CaM-target complexes and those that are molded to provide interaction specificity. We reveal several basic categories of sequence-level tradeoffs that enable the compromise necessary for the promiscuity of this protein. We also thoroughly quantify the tradeoff between interaction energetics and multispecificity and find that facilitating seemingly competing interactions requires only a small deviation from optimal energies. We conclude that multispecific proteins have been subjected to a rigorous optimization process that has fine-tuned their sequences for interactions with a precise set of targets, thus conferring their multiple cellular functions.


Subject(s)
Calmodulin/chemistry , Drug Design , Models, Chemical , Sequence Analysis, Protein/methods , Amino Acid Sequence , Binding Sites , Computer Simulation , Molecular Sequence Data , Protein Binding
SELECTION OF CITATIONS
SEARCH DETAIL