RESUMEN
We performed a comprehensive assessment of rare inherited variation in autism spectrum disorder (ASD) by analyzing whole-genome sequences of 2,308 individuals from families with multiple affected children. We implicate 69 genes in ASD risk, including 24 passing genome-wide Bonferroni correction and 16 new ASD risk genes, most supported by rare inherited variants, a substantial extension of previous findings. Biological pathways enriched for genes harboring inherited variants represent cytoskeletal organization and ion transport, which are distinct from pathways implicated in previous studies. Nevertheless, the de novo and inherited genes contribute to a common protein-protein interaction network. We also identified structural variants (SVs) affecting non-coding regions, implicating recurrent deletions in the promoters of DLG2 and NR3C2. Loss of nr3c2 function in zebrafish disrupts sleep and social function, overlapping with human ASD-related phenotypes. These data support the utility of studying multiplex families in ASD and are available through the Hartwell Autism Research and Technology portal.
Asunto(s)
Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad/genética , Linaje , Mapas de Interacción de Proteínas/genética , Animales , Niño , Bases de Datos Genéticas , Modelos Animales de Enfermedad , Femenino , Eliminación de Gen , Guanilato-Quinasas/genética , Humanos , Patrón de Herencia/genética , Aprendizaje Automático , Masculino , Núcleo Familiar , Regiones Promotoras Genéticas/genética , Receptores de Mineralocorticoides/genética , Factores de Riesgo , Proteínas Supresoras de Tumor/genética , Secuenciación Completa del Genoma , Pez Cebra/genéticaRESUMEN
Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and â¼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.
Asunto(s)
Genoma Humano , Genómica , Humanos , Algoritmos , Telómero/genética , Variación Genética , Análisis de Secuencia de ADNRESUMEN
Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.
Asunto(s)
Genoma , Patrón de Herencia , Humanos , Secuenciación Completa del Genoma , HaplotiposRESUMEN
Autism spectrum disorder (ASD) has a complex genetic architecture involving contributions from both de novo and inherited variation. Few studies have been designed to address the role of rare inherited variation or its interaction with common polygenic risk in ASD. Here, we performed whole-genome sequencing of the largest cohort of multiplex families to date, consisting of 4,551 individuals in 1,004 families having two or more autistic children. Using this study design, we identify seven previously unrecognized ASD risk genes supported by a majority of rare inherited variants, finding support for a total of 74 genes in our cohort and a total of 152 genes after combined analysis with other studies. Autistic children from multiplex families demonstrate an increased burden of rare inherited protein-truncating variants in known ASD risk genes. We also find that ASD polygenic score (PGS) is overtransmitted from nonautistic parents to autistic children who also harbor rare inherited variants, consistent with combinatorial effects in the offspring, which may explain the reduced penetrance of these rare variants in parents. We also observe that in addition to social dysfunction, language delay is associated with ASD PGS overtransmission. These results are consistent with an additive complex genetic risk architecture of ASD involving rare and common variation and further suggest that language delay is a core biological feature of ASD.
Asunto(s)
Trastorno del Espectro Autista , Trastornos del Desarrollo del Lenguaje , Niño , Humanos , Trastorno del Espectro Autista/genética , Herencia Multifactorial/genética , Padres , Secuenciación Completa del Genoma , Predisposición Genética a la EnfermedadRESUMEN
While hundreds of thousands of human whole genome sequences (WGS) have been collected in the effort to better understand genetic determinants of disease, these whole genome sequences have less frequently been used to study another major determinant of human health: the human virome. Using the unmapped reads from WGS of over 1000 families, we present insights into the human blood DNA virome, focusing particularly on human herpesvirus (HHV) 6A, 6B, and 7. In addition to extensively cataloguing the viruses detected in WGS of human whole blood and lymphoblastoid cell lines, we use the family structure of our dataset to show that household drives transmission of several viruses, and identify the Mendelian inheritance patterns characteristic of inherited chromsomally integrated human herpesvirus 6 (iciHHV-6). Consistent with prior studies, we find that 0.6% of our dataset's population has iciHHV, and we locate candidate integration sequences for these cases. We document genetic diversity within exogenous and integrated HHV species and within integration sites of HHV-6. Finally, in the first observation of its kind, we present evidence that suggests widespread de novo HHV-6B integration and HHV-7 integration and reactivation in lymphoblastoid cell lines. These findings show that the unmapped read space of WGS is a promising source of data for virology research.
Asunto(s)
Herpesvirus Humano 6 , Infecciones por Roseolovirus , Humanos , Herpesvirus Humano 6/genética , Integración Viral , Análisis de Secuencia , Línea CelularRESUMEN
BACKGROUND: Autism spectrum disorder (ASD) is a widespread neurodevelopmental condition with a range of potential causes and symptoms. Standard diagnostic mechanisms for ASD, which involve lengthy parent questionnaires and clinical observation, often result in long waiting times for results. Recent advances in computer vision and mobile technology hold potential for speeding up the diagnostic process by enabling computational analysis of behavioral and social impairments from home videos. Such techniques can improve objectivity and contribute quantitatively to the diagnostic process. OBJECTIVE: In this work, we evaluate whether home videos collected from a game-based mobile app can be used to provide diagnostic insights into ASD. To the best of our knowledge, this is the first study attempting to identify potential social indicators of ASD from mobile phone videos without the use of eye-tracking hardware, manual annotations, and structured scenarios or clinical environments. METHODS: Here, we used a mobile health app to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment. We used automated data set annotations to analyze two social indicators that have previously been shown to differ between children with ASD and their neurotypical (NT) peers: (1) gaze fixation patterns, which represent regions of an individual's visual focus and (2) visual scanning methods, which refer to the ways in which individuals scan their surrounding environment. We compared the gaze fixation and visual scanning methods used by children during a 90-second gameplay video to identify statistically significant differences between the 2 cohorts; we then trained a long short-term memory (LSTM) neural network to determine if gaze indicators could be predictive of ASD. RESULTS: Our results show that gaze fixation patterns differ between the 2 cohorts; specifically, we could identify 1 statistically significant region of fixation (P<.001). In addition, we also demonstrate that there are unique visual scanning patterns that exist for individuals with ASD when compared to NT children (P<.001). A deep learning model trained on coarse gaze fixation annotations demonstrates mild predictive power in identifying ASD. CONCLUSIONS: Ultimately, our study demonstrates that heterogeneous video data sets collected from mobile devices hold potential for quantifying visual patterns and providing insights into ASD. We show the importance of automated labeling techniques in generating large-scale data sets while simultaneously preserving the privacy of participants, and we demonstrate that specific social engagement indicators associated with ASD can be identified and characterized using such data.
Asunto(s)
Trastorno del Espectro Autista , Aplicaciones Móviles , Trastorno del Espectro Autista/diagnóstico , Niño , Computadoras de Mano , Fijación Ocular , Humanos , Participación SocialRESUMEN
BACKGROUND: Sequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and Microphenoor DiTaxa features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction. RESULTS: On two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR <.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. Porphyromonadaceae, Ruminococcaceae, and an unnamed species of Blastocystis were significantly enriched in autism, while Veillonellaceae was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR<.1). We observed Megasphaera andSutterellaceae highly enriched in obesity, and Phocaeicola significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84. CONCLUSIONS: SBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from http://github.com/briannachrisman/16s_biomarkers .
Asunto(s)
Microbioma Gastrointestinal , Biomarcadores , Análisis por Conglomerados , Microbioma Gastrointestinal/genética , Humanos , ARN Ribosómico 16S/genética , Programas InformáticosRESUMEN
BACKGROUND: Complex human health conditions with etiological heterogeneity like Autism Spectrum Disorder (ASD) often pose a challenge for traditional genome-wide association study approaches in defining a clear genotype to phenotype model. Coalitional game theory (CGT) is an exciting method that can consider the combinatorial effect of groups of variants working in concert to produce a phenotype. CGT has been applied to associate likely-gene-disrupting variants encoded from whole genome sequence data to ASD; however, this previous approach cannot take into account for prior biological knowledge. Here we extend CGT to incorporate a priori knowledge from biological networks through a game theoretic centrality measure based on Shapley value to rank genes by their relevance-the individual gene's synergistic influence in a gene-to-gene interaction network. Game theoretic centrality extends the notion of Shapley value to the evaluation of a gene's contribution to the overall connectivity of its corresponding node in a biological network. RESULTS: We implemented and applied game theoretic centrality to rank genes on whole genomes from 756 multiplex autism families. Top ranking genes with the highest game theoretic centrality in both the weighted and unweighted approaches were enriched for pathways previously associated with autism, including pathways of the immune system. Four of the selected genes HLA-A, HLA-B, HLA-G, and HLA-DRB1-have also been implicated in ASD and further support the link between ASD and the human leukocyte antigen complex. CONCLUSIONS: Game theoretic centrality can prioritize influential, disease-associated genes within biological networks, and assist in the decoding of polygenic associations to complex disorders like autism.
Asunto(s)
Algoritmos , Teoría del Juego , Redes Reguladoras de Genes , Estudios de Asociación Genética , Trastorno del Espectro Autista/genética , Estudio de Asociación del Genoma Completo , Humanos , Mapeo de Interacción de Proteínas , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Several studies have shown that facial attention differs in children with autism. Measuring eye gaze and emotion recognition in children with autism is challenging, as standard clinical assessments must be delivered in clinical settings by a trained clinician. Wearable technologies may be able to bring eye gaze and emotion recognition into natural social interactions and settings. OBJECTIVE: This study aimed to test: (1) the feasibility of tracking gaze using wearable smart glasses during a facial expression recognition task and (2) the ability of these gaze-tracking data, together with facial expression recognition responses, to distinguish children with autism from neurotypical controls (NCs). METHODS: We compared the eye gaze and emotion recognition patterns of 16 children with autism spectrum disorder (ASD) and 17 children without ASD via wearable smart glasses fitted with a custom eye tracker. Children identified static facial expressions of images presented on a computer screen along with nonsocial distractors while wearing Google Glass and the eye tracker. Faces were presented in three trials, during one of which children received feedback in the form of the correct classification. We employed hybrid human-labeling and computer vision-enabled methods for pupil tracking and world-gaze translation calibration. We analyzed the impact of gaze and emotion recognition features in a prediction task aiming to distinguish children with ASD from NC participants. RESULTS: Gaze and emotion recognition patterns enabled the training of a classifier that distinguished ASD and NC groups. However, it was unable to significantly outperform other classifiers that used only age and gender features, suggesting that further work is necessary to disentangle these effects. CONCLUSIONS: Although wearable smart glasses show promise in identifying subtle differences in gaze tracking and emotion recognition patterns in children with and without ASD, the present form factor and data do not allow for these differences to be reliably exploited by machine learning systems. Resolving these challenges will be an important step toward continuous tracking of the ASD phenotype.
Asunto(s)
Trastorno del Espectro Autista/terapia , Emociones/fisiología , Gafas Inteligentes/normas , Dispositivos Electrónicos Vestibles/normas , Adolescente , Niño , Femenino , Humanos , Masculino , FenotipoRESUMEN
BACKGROUND: Autism affects 1 in every 59 children in the United States, according to estimates from the Centers for Disease Control and Prevention's Autism and Developmental Disabilities Monitoring Network in 2018. Although similar rates of autism are reported in rural and urban areas, rural families report greater difficulty in accessing resources. An overwhelming number of families experience long waitlists for diagnostic and therapeutic services. OBJECTIVE: The objective of this study was to accurately identify gaps in access to autism care using GapMap, a mobile platform that connects families with local resources while continuously collecting up-to-date autism resource epidemiological information. METHODS: After being extracted from various databases, resources were deduplicated, validated, and allocated into 7 categories based on the keywords identified on the resource website. The average distance between the individuals from a simulated autism population and the nearest autism resource in our database was calculated for each US county. Resource load, an approximation of demand over supply for diagnostic resources, was calculated for each US county. RESULTS: There are approximately 28,000 US resources validated on the GapMap database, each allocated into 1 or more of the 7 categories. States with the greatest distances to autism resources included Alaska, Nevada, Wyoming, Montana, and Arizona. Of the 7 resource categories, diagnostic resources were the most underrepresented, comprising only 8.83% (2472/28,003) of all resources. Alarmingly, 83.86% (2635/3142) of all US counties lacked any diagnostic resources. States with the highest diagnostic resource load included West Virginia, Kentucky, Maine, Mississippi, and New Mexico. CONCLUSIONS: Results from this study demonstrate the sparsity and uneven distribution of diagnostic resources in the United States, which may contribute to the lengthy waitlists and travel distances-barriers to be overcome to be able to receive diagnosis in specific regions. More data are needed on autism diagnosis demand to better quantify resource needs across the United States.
Asunto(s)
Trastorno Autístico/terapia , Colaboración de las Masas/métodos , Trastorno Autístico/epidemiología , Niño , Femenino , Humanos , Masculino , Estados UnidosRESUMEN
BACKGROUND: Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. RESULTS: In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. CONCLUSIONS: MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.
Asunto(s)
Nube Computacional , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Bases de Datos Genéticas , Genoma Humano , Humanos , Internet , Programas Informáticos , Flujo de TrabajoRESUMEN
BACKGROUND: Numerous studies have highlighted the elevated degree of comorbidity associated with autism spectrum disorder (ASD). These comorbid conditions may add further impairments to individuals with autism and are substantially more prevalent compared to neurotypical populations. These high rates of comorbidity are not surprising taking into account the overlap of symptoms that ASD shares with other pathologies. From a research perspective, this suggests common molecular mechanisms involved in these conditions. Therefore, identifying crucial genes in the overlap between ASD and these comorbid disorders may help unravel the common biological processes involved and, ultimately, shed some light in the understanding of autism etiology. RESULTS: In this work, we used a two-fold systems biology approach specially focused on biological processes and gene networks to conduct a comparative analysis of autism with 31 frequently comorbid disorders in order to define a multi-disorder subcomponent of ASD and predict new genes of potential relevance to ASD etiology. We validated our predictions by determining the significance of our candidate genes in high throughput transcriptome expression profiling studies. Using prior knowledge of disease-related biological processes and the interaction networks of the disorders related to autism, we identified a set of 19 genes not previously linked to ASD that were significantly differentially regulated in individuals with autism. In addition, these genes were of potential etiologic relevance to autism, given their enriched roles in neurological processes crucial for optimal brain development and function, learning and memory, cognition and social behavior. CONCLUSIONS: Taken together, our approach represents a novel perspective of autism from the point of view of related comorbid disorders and proposes a model by which prior knowledge of interaction networks may enlighten and focus the genome-wide search for autism candidate genes to better define the genetic heterogeneity of ASD.
Asunto(s)
Trastorno del Espectro Autista/epidemiología , Trastorno del Espectro Autista/genética , Comorbilidad , Biología de Sistemas , Trastorno del Espectro Autista/etiología , Perfilación de la Expresión Génica , HumanosRESUMEN
Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action.
Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos , Predisposición Genética a la Enfermedad , Genómica/métodos , HumanosRESUMEN
SUMMARY: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. AVAILABILITY AND IMPLEMENTATION: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. CONTACT: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Lenguajes de ProgramaciónRESUMEN
BACKGROUND: A considerable number of minors in the United States are diagnosed with developmental or psychiatric conditions, potentially influenced by underdiagnosis factors such as cost, distance, and clinician availability. Despite the potential of digital phenotyping tools with machine learning (ML) approaches to expedite diagnoses and enhance diagnostic services for pediatric psychiatric conditions, existing methods face limitations because they use a limited set of social features for prediction tasks and focus on a single binary prediction, resulting in uncertain accuracies. OBJECTIVE: This study aims to propose the development of a gamified web system for data collection, followed by a fusion of novel crowdsourcing algorithms with ML behavioral feature extraction approaches to simultaneously predict diagnoses of autism spectrum disorder and attention-deficit/hyperactivity disorder in a precise and specific manner. METHODS: The proposed pipeline will consist of (1) gamified web applications to curate videos of social interactions adaptively based on the needs of the diagnostic system, (2) behavioral feature extraction techniques consisting of automated ML methods and novel crowdsourcing algorithms, and (3) the development of ML models that classify several conditions simultaneously and that adaptively request additional information based on uncertainties about the data. RESULTS: A preliminary version of the web interface has been implemented, and a prior feature selection method has highlighted a core set of behavioral features that can be targeted through the proposed gamified approach. CONCLUSIONS: The prospect for high reward stems from the possibility of creating the first artificial intelligence-powered tool that can identify complex social behaviors well enough to distinguish conditions with nuanced differentiators such as autism spectrum disorder and attention-deficit/hyperactivity disorder. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/52205.
RESUMEN
UNLABELLED: Roundup is an online database of gene orthologs for over 1800 genomes, including 226 Eukaryota, 1447 Bacteria, 113 Archaea and 21 Viruses. Orthologs are inferred using the Reciprocal Smallest Distance algorithm. Users may query Roundup for single-linkage clusters of orthologous genes based on any group of genomes. Annotated query results may be viewed in a variety of ways including as clusters of orthologs and as phylogenetic profiles. Genomic results may be downloaded in formats suitable for functional as well as phylogenetic analysis, including the recent OrthoXML standard. In addition, gene IDs can be retrieved using FASTA sequence search. All source code and orthologs are freely available. AVAILABILITY: http://roundup.hms.harvard.edu.
Asunto(s)
Algoritmos , Genómica/métodos , Filogenia , Animales , Archaea/genética , Bacterias/genética , Análisis por Conglomerados , Evolución Molecular , Genoma , Humanos , Virus/genéticaRESUMEN
Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly.
Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Humanos , Trastorno Autístico/diagnóstico , Trastorno del Espectro Autista/diagnóstico , Ciencia de los Datos , Aprendizaje Automático , FenotipoRESUMEN
While healthy gut microbiomes are critical to human health, pertinent microbial processes remain largely undefined, partially due to differential bias among profiling techniques. By simultaneously integrating multiple profiling methods, multi-omic analysis can define generalizable microbial processes, and is especially useful in understanding complex conditions such as Autism. Challenges with integrating heterogeneous data produced by multiple profiling methods can be overcome using Latent Dirichlet Allocation (LDA), a promising natural language processing technique that identifies topics in heterogeneous documents. In this study, we apply LDA to multi-omic microbial data (16S rRNA amplicon, shotgun metagenomic, shotgun metatranscriptomic, and untargeted metabolomic profiling) from the stool of 81 children with and without Autism. We identify topics, or microbial processes, that summarize complex phenomena occurring within gut microbial communities. We then subset stool samples by topic distribution, and identify metabolites, specifically neurotransmitter precursors and fatty acid derivatives, that differ significantly between children with and without Autism. We identify clusters of topics, deemed "cross-omic topics", which we hypothesize are representative of generalizable microbial processes observable regardless of profiling method. Interpreting topics, we find each represents a particular diet, and we heuristically label each cross-omic topic as: healthy/general function, age-associated function, transcriptional regulation, and opportunistic pathogenesis.
Asunto(s)
Trastorno Autístico , Microbioma Gastrointestinal , Microbiota , Niño , Humanos , Microbioma Gastrointestinal/genética , Multiómica , ARN Ribosómico 16S/genética , Microbiota/genéticaRESUMEN
Importance: While research has identified racial and ethnic disparities in access to autism services, the size, extent, and specific locations of these access gaps have not yet been characterized on a national scale. Mapping comprehensive national listings of autism health care services together with the prevalence of autistic children of various races and ethnicities and evaluating geographic regions defined by localized commuting patterns may help to identify areas within the US where families who belong to minoritized racial and ethnic groups have disproportionally lower access to services. Objective: To evaluate differences in access to autism health care services among autistic children of various races and ethnicities within precisely defined geographic regions encompassing all serviceable areas within the US. Design, Setting, and Participants: This population-based cross-sectional study was conducted from October 5, 2021, to June 3, 2022, and involved 530â¯965 autistic children in kindergarten through grade 12. Core-based statistical areas (CBSAs; defined as areas containing a city and its surrounding commuter region), the Civil Rights Data Collection (CRDC) data set, and 51â¯071 autism resources (collected from October 1, 2015, to December 18, 2022) geographically distributed into 912 CBSAs were combined and analyzed to understand variation in access to autism health care services among autistic children of different races and ethnicities. Six racial and ethnic categories (American Indian or Alaska Native, Asian, Black or African American, Hispanic or Latino, Native Hawaiian or other Pacific Islander, and White) assigned by the US Department of Education were included in the analysis. Main Outcomes and Measures: A regularized least-squares regression analysis was used to measure differences in nationwide resource allocation between racial and ethnic groups. The number of autism resources allocated per autistic child was estimated based on the child's racial and ethnic group. To evaluate how the CBSA population size may have altered the results, the least-squares regression analysis was run on CBSAs divided into metropolitan (>50 000 inhabitants) and micropolitan (10 000-50 000 inhabitants) groups. A Mann-Whitney U test was used to compare the model estimated ratio of autism resources to autistic children among specific racial and ethnic groups comprising the proportions of autistic children in each CBSA. Results: Among 530 965 autistic children aged 5 to 18 years, 83.9% were male and 16.1% were female; 0.7% of children were American Indian or Alaska Native, 5.9% were Asian, 14.3% were Black or African American, 22.9% were Hispanic or Latino, 0.2% were Native Hawaiian or other Pacific Islander, 51.7% were White, and 4.2% were of 2 or more races and/or ethnicities. At a national scale, American Indian or Alaska Native autistic children (ß = 0; 95% CI, 0-0; P = .01) and Hispanic autistic children (ß = 0.02; 95% CI, 0-0.06; P = .02) had significant disparities in access to autism resources in comparison with White autistic children. When evaluating the proportion of autistic children in each racial and ethnic group, areas in which Black autistic children (>50% of the population: ß = 0.05; <50% of the population: ß = 0.07; P = .002) or Hispanic autistic children (>50% of the population: ß = 0.04; <50% of the population: ß = 0.07; P < .001) comprised greater than 50% of the total population of autistic children had significantly fewer resources than areas in which Black or Hispanic autistic children comprised less than 50% of the total population. Comparing metropolitan vs micropolitan CBSAs revealed that in micropolitan CBSAs, Black autistic children (ß = 0; 95% CI, 0-0; P < .001) and Hispanic autistic children (ß = 0; 95% CI, 0-0.02; P < .001) had the greatest disparities in access to autism resources compared with White autistic children. In metropolitan CBSAs, American Indian or Alaska Native autistic children (ß = 0; 95% CI, 0-0; P = .005) and Hispanic autistic children (ß = 0.01; 95% CI, 0-0.06; P = .02) had the greatest disparities compared with White autistic children. Conclusions and Relevance: In this study, autistic children from several minoritized racial and ethnic groups, including Black and Hispanic autistic children, had access to significantly fewer autism resources than White autistic children in the US. This study pinpointed the specific geographic regions with the greatest disparities, where increases in the number and types of treatment options are warranted. These findings suggest that a prioritized response strategy to address these racial and ethnic disparities is needed.
Asunto(s)
Trastorno Autístico , Niño , Humanos , Masculino , Femenino , Estudios Transversales , Accesibilidad a los Servicios de Salud , Disparidades en Atención de Salud , Grupos RacialesRESUMEN
Innovations in human-centered biomedical informatics are often developed with the eventual goal of real-world translation. While biomedical research questions are usually answered in terms of how a method performs in a particular context, we argue that it is equally important to consider and formally evaluate the ethical implications of informatics solutions. Several new research paradigms have arisen as a result of the consideration of ethical issues, including but not limited for privacy-preserving computation and fair machine learning. In the spirit of the Pacific Symposium on Biocomputing, we discuss broad and fundamental principles of ethical biomedical informatics in terms of Olelo Noeau, or Hawaiian proverbs and poetical sayings that capture Hawaiian values. While we emphasize issues related to privacy and fairness in particular, there are a multitude of facets to ethical biomedical informatics that can benefit from a critical analysis grounded in ethics.