ABSTRACT
PURPOSE: Lung cancer is strongly associated to tobacco smoking. However, global statistics estimate that in females the proportion of lung cancer cases that is unrelated to tobacco smoking reaches fifty percent, making questionable the etiology of the disease. MATERIALS AND METHODS: A never-smoker female with primary EGFR/KRAS/ALK-negative squamous cell carcinoma of the lung and their normal sibswere subjected to a novel integrative “omic” approach using a pedigree-based model for discovering genetic factors leading to cancer in the absence of well-known environmental trigger. A first-stepwhole-exome sequencing on tumor and normal tissue did not identify mutations in known driver genes. Building on the idea of a germline oligogenic origin of lung cancer, we performed whole-exome sequencing of DNA from patients' peripheral blood and their unaffected sibs. Finally, RNA-sequencing analysis in tumoral and matched non-tumoral tissues was carried out in order to investigate the clonal profile and the pathogenic role of the identified variants. RESULTS: Filtering for rare variants with Combined Annotation Dependent Depletion (CADD) > 25 and potentially damaging effect, we identified rare/private germline deleterious variants in 11 cancer-associated genes, none ofwhich, except one, sharedwith the healthy sib, pinpointing to a “private” oligogenic germline signature. Noteworthy, among these, two mutated genes, namely ACACA and DEPTOR, turned to be potential targets for therapy because related to known drivers, such as BRCA1 and EGFR. CONCLUSION: In the era of precision medicine, this report emphasizes the importance of an “omic” approach to uncover oligogenic germline signature underlying cancer development and to identify suitable therapeutic targets as well.
Subject(s)
Female , Humans , Carcinoma, Squamous Cell , Disease Susceptibility , DNA , Epithelial Cells , Exome , High-Throughput Nucleotide Sequencing , Lung Neoplasms , Lung , Multifactorial Inheritance , Precision Medicine , SmokingABSTRACT
BackgroundCOVID-19 clinical presentation ranges from asymptomatic to fatal outcome. This variability is due in part to host genome specific mutations. Recently, two families in which COVID-19 segregates like an X-linked recessive monogenic disorder environmentally conditioned by SARS-CoV-2 have been reported leading to identification of loss-of-function variants in TLR7. ObjectiveWe sought to determine whether the two families represent the tip of the iceberg of a subset of COVID-19 male patients. MethodsWe compared male subjects with extreme phenotype selected from the Italian GEN-COVID cohort of 1178 SARS-CoV-2-infected subjects (<60y, 79 severe cases versus 77 control cases). We applied the LASSO Logistic Regression analysis, considering only rare variants on the young male subset, picking up TLR7 as the most important susceptibility gene. ResultsRare TLR7 missense variants were predicted to impact on protein function in severely affected males and in none of the asymptomatic subjects. We then investigated a similar white European cohort in Spain, confirming the impact of TRL7 variants. A gene expression profile analysis in peripheral blood mononuclear cells after stimulation with TLR7 agonist demonstrated a reduction of mRNA level of TLR7, IRF7, ISG15, IFN-{square} and IFN-{gamma} in COVID-19 patients compared with unaffected controls demonstrating an impairment in type I and II INF responses. ConclusionYoung males with TLR7 loss-of-function mutations and severe COVID-19 in the two reported families represent only a fraction of a broader and complex host genome situation. Specifically, missense mutations in the X-linked recessive TLR7 disorder may significantly contribute to disease susceptibility in up to 4% of severe COVID-19. Clinical ImplicationIn this new yet complex scenario, our observations provide the basis for a personalized interferon-based therapy in patients with rare TLR7 variants. CAPSULE SUMMARYOur results in large cohorts from Italy and Spain showed that X-linked recessive TLR7 disorder may represent the cause of disease susceptibility to COVID-19 in up to 4% of severely affected young male cases.
ABSTRACT
Within the GEN-COVID Multicenter Study, biospecimens from more than 1,000 SARS-CoV-2-positive individuals have thus far been collected in the GEN-COVID Biobank (GCB). Sample types include whole blood, plasma, serum, leukocytes, and DNA. The GCB links samples to detailed clinical data available in the GEN-COVID Patient Registry (GCPR). It includes hospitalized patients (74.25%), broken down into intubated, treated by CPAP-biPAP, treated with O2 supplementation, and without respiratory support (9.5%, 18.4%, 31.55% and 14.8, respectively); and non-hospitalized subjects (25.75%), either pauci- or asymptomatic. More than 150 clinical patient-level data fields have been collected and binarized for further statistics according to the organs/systems primarily affected by COVID-19: heart, liver, pancreas, kidney, chemosensors, innate or adaptive immunity, and clotting system. Hierarchical Clustering analysis identified five main clinical categories: i) severe multisystemic failure with either thromboembolic or pancreatic variant; ii) cytokine storm type, either severe with liver involvement or moderate; iii) moderate heart type, either with or without liver damage; iv) moderate multisystemic involvement, either with or without liver damage; v) mild, either with or without hyposmia. GCB and GCPR are further linked to the GEN-COVID Genetic Data Repository (GCGDR), which includes data from Whole Exome Sequencing and high-density SNP genotyping. The data are available for sharing through the Network for Italian Genomes, found within the COVID-19 dedicated section. The study objective is to systematize this comprehensive data collection and begin identifying multi-organ involvement in COVID-19, defining genetic parameters for infection susceptibility within the population and mapping genetically COVID-19 severity and clinical complexity among patients.
ABSTRACT
Thromboembolism is a frequent cause of severity and mortality in COVID-19. However, the etiology of this phenomenon is not well understood. A cohort of 1,186 subjects, from the GEN-COVID consortium, infected by SARS-CoV-2 with different severity were stratified by sex and adjusted by age. Then, common coding variants from whole exome sequencing were mined by LASSO logistic regression. The homozygosity of the cell adhesion molecule P-selectin gene (SELP) rs6127 (c.1807G>A; p.Asp603Asn) which increases platelet activation is found to be associated with severity in the male subcohort of 513 subjects (Odds Ratio= 2.27, 95% Confidence Interval 1.54-3.36). As the SELP gene is downregulated by testosterone, the odd ratio is increased in males older than 50 (OR 2.42, 95% CI 1.53-3.82). Asn/Asn homozygotes have increased D-dimers values especially when associated with poly Q[≥]23 in the androgen receptor (AR) gene (OR 3.26, 95% CI 1.41-7.52). These results provide a rationale for the repurposing of antibodies against P-selectin as adjuvant therapy in rs6127 male homozygotes especially if older than 50 or with impaired AR gene. Key points{circ} The functional polymorphism rs6127 (p.Asp603Asn) in the testosterone-regulated SELP gene associates with COVID-19 severity and thrombosis. {circ}Conditions with decreased testosterone (old males), or decreased testosterone efficacy (AR gene polyQ [≥] 23) strengthen the association.
ABSTRACT
Clinical and molecular characterization by Whole Exome Sequencing (WES) is reported in 35 COVID-19 patients attending the University Hospital in Siena, Italy, from April 7 to May 7, 2020. Eighty percent of patients required respiratory assistance, half of them being on mechanical ventilation. Fiftyone percent had hepatic involvement and hyposmia was ascertained in 3 patients. Searching for common genes by collapsing methods against 150 WES of controls of the Italian population failed to give straightforward statistically significant results with the exception of two genes. This result is not unexpected since we are facing the most challenging common disorder triggered by environmental factors with a strong underlying heritability (50%). The lesson learned from Autism-Spectrum-Disorders prompted us to re-analyse the cohort treating each patient as an independent case, following a Mendelian-like model. We identified for each patient an average of 2.5 pathogenic mutations involved in virus infection susceptibility and pinpointing to one or more rare disorder(s). To our knowledge, this is the first report on WES and COVID-19. Our results suggest a combined model for COVID-19 susceptibility with a number of common susceptibility genes which represent the favorite background in which additional host private mutations may determine disease progression.
ABSTRACT
The polymorphism L412F in TLR3 has been associated with several infectious diseases. However, the mechanism underlying this association is still unexplored. Here, we show that the L412F polymorphism in TLR3 is a marker of severity in COVID-19. This association increases in the sub-cohort of males. Impaired autophagy and reduced TNF production was demonstrated in HEK293 cells transfected with TLR3-L412F plasmid and stimulated with specific agonist poly(I:C). A statistically significant reduced survival at 28 days was shown in L412F COVID-19 patients treated with the autophagy-inhibitor hydroxychloroquine (P=0.038). An increased frequency of autoimmune disorders as co-morbidity was found in L412F COVID-19 males with specific class II HLA haplotypes prone to autoantigen presentation. Our analyses indicate that L412F polymorphism makes males at risk of severe COVID-19 and provides a rationale for reinterpreting clinical trials considering autophagy pathways.
ABSTRACT
Host genetics is an emerging theme in COVID-19 and few common polymorphisms and some rare variants have been identified, either by GWAS or candidate gene approach, respectively. However, an organic model is still missing. Here, we propose a new model that takes into account common and rare germline variants applied in a cohort of 1,300 Italian SARS-CoV-2 positive individuals. Ordered logistic regression of clinical WHO grading on sex and age was used to obtain a binary phenotypic classification. Genetic variability from WES was synthesized in several boolean representations differentiated according to allele frequencies and genotype effect. LASSO logistic regression was used for extracting relevant genes. We defined about 100 common driver polymorphisms corresponding to classical "threshold model". Extracted genes were demonstrated to be gender specific. Stochastic rare more penetrant events on about additional 100 extracted genes, when occurred in a medium or severe background (common within the family), simulate Mendelian inheritance in 14% of subjects (having only 1 mutation) or oligogenic inheritance (in 10% having 2 mutations, in 11% having 3 mutations, etc). The combined effect of common and rare results can be described as an integrated polygenic score computed as: (nseverity - nmildness) + F (mseverity - mmildness) where n is the number of common driver genes, m is the number of driver rare variants and F is a factor for appropriately weighing the more powerful rare variants. We called the model "post-Mendelian". The model well describes the cohort, and patients are clustered in severe or mild by the integrated polygenic scores, the F factor being calibrated around 2, with a prediction capacity of 65% in males and 70% in females. In conclusion, this is the first comprehensive model interpreting host genetics in a holistic post-Mendelian manner. Further validations are needed in order to consolidate and refine the model which however holds true in thousands of SARS-CoV-2 Italian subjects.
ABSTRACT
Many host pathogen interactions such as human viruses (including non-SARS-coronaviruses) rely on attachment to host cell-surface glycans. There are conflicting reports about whether the Spike protein of SARS-CoV-2 binds to sialic acid commonly found on host cell-surface N-linked glycans. In the absence of a biochemical assay, the ability to analyze the binding of glycans to heavily- modified proteins and resolve this issue is limited. Classical Saturation Transfer Difference (STD) NMR can be confounded by overlapping sugar resonances that compound with known experimental constraints. Here we present universal saturation transfer analysis (uSTA), an NMR method that builds on existing approaches to provide a general and automated workflow for studying protein-ligand interactions. uSTA reveals that B-origin-lineage-SARS-CoV-2 spike trimer binds sialoside sugars in an end on manner and modelling guided by uSTA localises binding to the spike N-terminal domain (NTD). The sialylated-polylactosamine motif is found on tetraantennary human N-linked-glycoproteins in deeper lung and may have played a role in zoonosis. Provocatively, sialic acid binding is abolished by mutations in some subsequent SARS- CoV-2 variants-of-concern. A very high resolution cryo-EM structure confirms the NTD location and end on mode; it rationalises the effect of NTD mutations and the structure-activity relationship of sialic acid analogues. uSTA is demonstrated to be a robust, rapid and quantitative tool for analysis of binding, even in the most demanding systems. Extended AbstractThe surface proteins found on both pathogens and host cells mediate entry (and exit) and influence disease progression and transmission. Both types can bear host-generated post- translational modifications such as glycosylation that are essential for function but can confound biophysical methods used for dissecting key interactions. Several human viruses (including non- SARS-coronaviruses) attach to host cell-surface N-linked glycans that include forms of sialic acid (sialosides). There remains, however, conflicting evidence as to if or how SARS-associated coronaviruses might use such a mechanism. Here, we demonstrate quantitative extension of saturation transfer protein NMR methods to a complete mathematical model of the magnetization transfer caused by interactions between protein and ligand. The method couples objective resonance-identification via a deconvolution algorithm with Bloch-McConnell analysis to enable a structural, kinetic and thermodynamic analysis of ligand binding beyond previously-perceived limits of exchange rates, concentration or system. Using an automated and openly available workflow this universal saturation transfer analysis (uSTA) can be readily-applied in a range of even heavily-modified systems in a general manner to now obtain quantitative binding interaction parameters (KD, kEx). uSTA proved critical in mapping direct interactions between natural sialoside sugar ligands and relevant virus-surface attachment glycoproteins - SARS-CoV-2-spike and influenza-H1N1-haemagglutinin variants - by quantitating ligand signal in spectral regions otherwise occluded by resonances from mobile protein glycans (that also include sialosides). In B- origin-lineage-SARS-CoV-2 spike trimer end on-binding to sialoside sugars was revealed contrasting with extended surface-binding for heparin sugar ligands; uSTA-derived constraints used in structural modelling suggested sialoside-glycan binding sites in a beta-sheet-rich region of spike N-terminal domain (NTD). Consistent with this, uSTA-glycan binding was minimally- perturbed by antibodies that neutralize the ACE2-binding domain (RBD) but strongly disrupted in spike from the B1.1.7/alpha and B1.351/beta variants-of-concern, which possess hotspot mutations in the NTD. Sialoside binding in B-origin-lineage-NTD was unequivocally pinpointed by cryo-EM to a site that is created from residues that are notably deleted in variants (e.g. H69,V70,Y145 in alpha). An analysis of beneficial genetic variances in cohorts of patients from early 2020 suggests a model in which this site in the NTD of B-origin-lineage-SARS-CoV-2 (but not in alpha/beta-variants) may have exploited a specific sialylated-polylactosamine motif found on tetraantennary human N-linked-glycoproteins in deeper lung. Together these confirm a novel binding mode mediated by the unusual NTD of SARS-CoV-2 and suggest how it may drive virulence and/or zoonosis via modulation of glycan attachment. Since cell-surface glycans are widely relevant to biology and pathology, uSTA can now provide ready, quantitative, widespread analysis of complex, host-derived and post-translationally modified proteins with putative ligands relevant to disease even in previously confounding complex systems.
ABSTRACT
BackgroundCOVID-19 presentation ranges from asymptomatic to fatal. The variability in severity may be due in part to impaired Interferon type I response due to specific mutations in the host genome or to autoantibodies, explaining about 15% of the cases when combined. Exploring the host genome is thus warranted to further elucidate disease variability. MethodsWe developed a synthetic approach to genetic data representation using machine learning methods to investigate complementary genetic variability in COVID-19 infected patients that may explain disease severity, due to poly-amino acids repeat polymorphisms. Using host whole-exome sequencing data, we compared extreme phenotypic presentations (338 severe versus 300 asymptomatic cases) of the entire (men and women) Italian GEN-COVID cohort of 1178 subjects infected with SARS-CoV-2. We then applied the LASSO Logistic Regression model on Boolean gene-based representation of the poly-amino acids variability. FindingsShorter polyQ alleles ([≤]22) in the androgen receptor (AR) conferred protection against a more severe outcome in COVID-19 infection. In the subgroup of males with age <60 years, testosterone was higher in subjects with AR long-polyQ ([≥]23), possibly indicating receptor resistance (p=0.004 Mann-Whitney U test). Inappropriately low testosterone levels for the long-polyQ alleles predicted the need for intensive care in COVID-19 infected men. In agreement with the known anti-inflammatory action of testosterone, patients with long-polyQ ([≥]23) and age>60 years had increased levels of C Reactive Protein (p=0.018). InterpretationOur results may contribute to design reliable clinical and public health measures and provide a rationale to test testosterone treatment as adjuvant therapy in symptomatic COVID-19 men expressing AR polyQ longer than 23 repeats. FundingMIUR project "Dipartimenti di Eccellenza 2018-2020" to Department of Medical Biotechnologies University of Siena, Italy (Italian D.L. n.18 March 17, 2020). Private donors for COVID research and charity funds from Intesa San Paolo. BoxesO_ST_ABSEvidence before this studyC_ST_ABSWe searched on Medline, EMBASE, and Pubmed for articles published from January 2020 to August 2020 using various combinations of the search terms "sex-difference", "gender" AND SARS-Cov-2, or COVID. Epidemiological studies indicate that men and women are similarly infected by COVID-19, but the outcome is less favorable in men, independently of age. Several studies also showed that patients with hypogonadism tend to be more severely affected. A prompt intervention directed toward the most fragile subjects with SARS-Cov2 infection is currently the only strategy to reduce mortality. glucocorticoid treatment has been found cost-effective in improving the outcome of severe cases. Clinical algorithms have been proposed, but little is known on the ability of genetic profiling to predict outcome and disclose novel therapeutic strategies. Added-value of this studyIn a cohort of 1178 men and women with COVID-19, we used a supervised machine learning approach on a synthetic representation of the uncovered variability of the human genome due to poly-amino acid repeats. Comparing the genotype of patients with extreme manifestations (severe vs. asymptomatic), we found that the poly-glutamine repeat of the androgen receptor (AR) gene is relevant for COVID-19 disease and defective AR signaling identifies an association between male sex, testosterone exposure, and COVID-19 outcome. Failure of the endocrine feedback to overcome AR signaling defect by increasing testosterone levels during the infection leads to the fact that polyQ becomes dominant to T levels for the clinical outcome. Implications of all the available evidenceWe identify the first genetic polymorphism predisposing some men to develop a more severe disease irrespectively of age. Based on this, we suggest that sizing the AR poly-glutamine repeat has important implications in the diagnostic pipeline of patients affected by life-threatening COVID-19 infection. Most importantly, our studies open to the potential of using testosterone as adjuvant therapy for severe COVID-19 patients having defective androgen signaling, defined by this study as [≥]23 PolyQ repeats and inappropriate levels of circulating androgens.
ABSTRACT
The combined impact of common and rare exonic variants in COVID-19 host genetics is currently insufficiently understood. Here, common and rare variants from whole exome sequencing data of about 4,000 SARS-CoV-2-positive individuals were used to define an interpretable machine learning model for predicting COVID-19 severity. Firstly, variants were converted into separate sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. The Boolean features selected by these logistic models were combined into an Integrated PolyGenic Score that offers a synthetic and interpretable index for describing the contribution of host genetics in COVID-19 severity, as demonstrated through testing in several independent cohorts. Selected features belong to ultra-rare, rare, low-frequency, and common variants, including those in linkage disequilibrium with known GWAS loci. Noteworthly, around one quarter of the selected genes are sex-specific. Pathway analysis of the selected genes associated with COVID-19 severity reflected the multi-organ nature of the disease. The proposed model might provide useful information for developing diagnostics and therapeutics, while also being able to guide bedside disease management.