RESUMEN
Genetic data is limited and generating new datasets is often an expensive, time-consuming process, involving countless moving parts to genotype and phenotype individuals. While sharing data is beneficial for quality control and software development, privacy and security are of utmost importance. Generating synthetic data is a practical solution to mitigate the cost, time and sensitivities that hamper developers and researchers in producing and validating novel biotechnological solutions to data intensive problems. Existing methods focus on mutation frequencies at specific loci while ignoring epistatic interactions. Alternatively, programs that do consider epistasis are limited to two-way interactions or apply genomic constraints that make synthetic data generation arduous or computationally intensive. To solve this, we developed Polygenic Epistatic Phenotype Simulator (PEPS). Our tool is a probabilistic model that can generate synthetic phenotypes with a controllable level of complexity.
Asunto(s)
Biotecnología , Modelos Estadísticos , Humanos , Simulación por Computador , Fenotipo , GenotipoRESUMEN
Healthcare data is a scarce resource and access is often cumbersome. While medical software development would benefit from real datasets, the privacy of the patients is held at a higher priority. Realistic synthetic healthcare data can fill this gap by providing a dataset for quality control while at the same time preserving the patient's anonymity and privacy. Existing methods focus on American or European patient healthcare data but none is exclusively focused on the Australian population. Australia is a highly diverse country that has a unique healthcare system. To overcome this problem, we used a popular publicly available tool, Synthea, to generate disease progressions based on the Australian population. With this approach, we were able to generate 100,000 patients following Queensland (Australia) demographics.
Asunto(s)
Instituciones de Salud , Privacidad , Humanos , Australia , Queensland , Progresión de la EnfermedadRESUMEN
Inborn errors of metabolism are genetic conditions that can disrupt intermediary metabolic pathways and cause defective absorption and metabolism of dietary nutrients. In an Australian Kelpie breeding population, 17 puppies presented with intestinal lipid malabsorption. Juvenile dogs exhibited stunted postnatal growth, steatorrhea, abdominal distension and a wiry coat. Using genome-wide association analysis, an associated locus on CFA28 (Praw = 2.87E-06) was discovered and validated in a closely related population (Praw = 1.75E-45). A 103.3 kb deletion NC_006610.3CFA28:g.23380074_23483377del, containing genes Acyl-CoA Synthetase Long Chain Family Member 5 (ACSL5) and Zinc Finger DHHC-Type Containing 6 (ZDHHC6), was characterised using whole transcriptomic data. Whole transcriptomic sequencing revealed no expression of ACSL5 and disrupted splicing of ZDHHC6 in jejunal tissue of affected Kelpies. The ACSL5 gene plays a key role in long chain fatty acid absorption, a phenotype similar to that of our affected Kelpies has been observed in a knockout mouse model. A PCR-based diagnostic test was developed and confirmed fully penetrant autosomal recessive mode of inheritance. We conclude the structural variant causing a deletion of the ACSL5 gene is the most likely cause for intestinal lipid malabsorption in the Australian Kelpie.