Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 122
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 178(4): 850-866.e26, 2019 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-31398340

RESUMO

We performed a comprehensive assessment of rare inherited variation in autism spectrum disorder (ASD) by analyzing whole-genome sequences of 2,308 individuals from families with multiple affected children. We implicate 69 genes in ASD risk, including 24 passing genome-wide Bonferroni correction and 16 new ASD risk genes, most supported by rare inherited variants, a substantial extension of previous findings. Biological pathways enriched for genes harboring inherited variants represent cytoskeletal organization and ion transport, which are distinct from pathways implicated in previous studies. Nevertheless, the de novo and inherited genes contribute to a common protein-protein interaction network. We also identified structural variants (SVs) affecting non-coding regions, implicating recurrent deletions in the promoters of DLG2 and NR3C2. Loss of nr3c2 function in zebrafish disrupts sleep and social function, overlapping with human ASD-related phenotypes. These data support the utility of studying multiplex families in ASD and are available through the Hartwell Autism Research and Technology portal.


Assuntos
Transtorno do Espectro Autista/genética , Predisposição Genética para Doença/genética , Linhagem , Mapas de Interação de Proteínas/genética , Animais , Criança , Bases de Dados Genéticas , Modelos Animais de Doenças , Feminino , Deleção de Genes , Guanilato Quinases/genética , Humanos , Padrões de Herança/genética , Aprendizado de Máquina , Masculino , Núcleo Familiar , Regiões Promotoras Genéticas/genética , Receptores de Mineralocorticoides/genética , Fatores de Risco , Proteínas Supressoras de Tumor/genética , Sequenciamento Completo do Genoma , Peixe-Zebra/genética
2.
Genome Res ; 33(10): 1734-1746, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37879860

RESUMO

Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and ∼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.


Assuntos
Genoma Humano , Genômica , Humanos , Algoritmos , Telômero/genética , Variação Genética , Análise de Sequência de DNA
3.
Genome Res ; 33(10): 1747-1756, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37879861

RESUMO

Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.


Assuntos
Genoma , Padrões de Herança , Humanos , Sequenciamento Completo do Genoma , Haplótipos
4.
Proc Natl Acad Sci U S A ; 120(31): e2215632120, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37506195

RESUMO

Autism spectrum disorder (ASD) has a complex genetic architecture involving contributions from both de novo and inherited variation. Few studies have been designed to address the role of rare inherited variation or its interaction with common polygenic risk in ASD. Here, we performed whole-genome sequencing of the largest cohort of multiplex families to date, consisting of 4,551 individuals in 1,004 families having two or more autistic children. Using this study design, we identify seven previously unrecognized ASD risk genes supported by a majority of rare inherited variants, finding support for a total of 74 genes in our cohort and a total of 152 genes after combined analysis with other studies. Autistic children from multiplex families demonstrate an increased burden of rare inherited protein-truncating variants in known ASD risk genes. We also find that ASD polygenic score (PGS) is overtransmitted from nonautistic parents to autistic children who also harbor rare inherited variants, consistent with combinatorial effects in the offspring, which may explain the reduced penetrance of these rare variants in parents. We also observe that in addition to social dysfunction, language delay is associated with ASD PGS overtransmission. These results are consistent with an additive complex genetic risk architecture of ASD involving rare and common variation and further suggest that language delay is a core biological feature of ASD.


Assuntos
Transtorno do Espectro Autista , Transtornos do Desenvolvimento da Linguagem , Criança , Humanos , Transtorno do Espectro Autista/genética , Herança Multifatorial/genética , Pais , Sequenciamento Completo do Genoma , Predisposição Genética para Doença
5.
Virol J ; 19(1): 225, 2022 12 24.
Artigo em Inglês | MEDLINE | ID: mdl-36566197

RESUMO

While hundreds of thousands of human whole genome sequences (WGS) have been collected in the effort to better understand genetic determinants of disease, these whole genome sequences have less frequently been used to study another major determinant of human health: the human virome. Using the unmapped reads from WGS of over 1000 families, we present insights into the human blood DNA virome, focusing particularly on human herpesvirus (HHV) 6A, 6B, and 7. In addition to extensively cataloguing the viruses detected in WGS of human whole blood and lymphoblastoid cell lines, we use the family structure of our dataset to show that household drives transmission of several viruses, and identify the Mendelian inheritance patterns characteristic of inherited chromsomally integrated human herpesvirus 6 (iciHHV-6). Consistent with prior studies, we find that 0.6% of our dataset's population has iciHHV, and we locate candidate integration sequences for these cases. We document genetic diversity within exogenous and integrated HHV species and within integration sites of HHV-6. Finally, in the first observation of its kind, we present evidence that suggests widespread de novo HHV-6B integration and HHV-7 integration and reactivation in lymphoblastoid cell lines. These findings show that the unmapped read space of WGS is a promising source of data for virology research.


Assuntos
Herpesvirus Humano 6 , Infecções por Roseolovirus , Humanos , Herpesvirus Humano 6/genética , Integração Viral , Análise de Sequência , Linhagem Celular
6.
J Med Internet Res ; 24(2): e31830, 2022 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-35166683

RESUMO

BACKGROUND: Autism spectrum disorder (ASD) is a widespread neurodevelopmental condition with a range of potential causes and symptoms. Standard diagnostic mechanisms for ASD, which involve lengthy parent questionnaires and clinical observation, often result in long waiting times for results. Recent advances in computer vision and mobile technology hold potential for speeding up the diagnostic process by enabling computational analysis of behavioral and social impairments from home videos. Such techniques can improve objectivity and contribute quantitatively to the diagnostic process. OBJECTIVE: In this work, we evaluate whether home videos collected from a game-based mobile app can be used to provide diagnostic insights into ASD. To the best of our knowledge, this is the first study attempting to identify potential social indicators of ASD from mobile phone videos without the use of eye-tracking hardware, manual annotations, and structured scenarios or clinical environments. METHODS: Here, we used a mobile health app to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment. We used automated data set annotations to analyze two social indicators that have previously been shown to differ between children with ASD and their neurotypical (NT) peers: (1) gaze fixation patterns, which represent regions of an individual's visual focus and (2) visual scanning methods, which refer to the ways in which individuals scan their surrounding environment. We compared the gaze fixation and visual scanning methods used by children during a 90-second gameplay video to identify statistically significant differences between the 2 cohorts; we then trained a long short-term memory (LSTM) neural network to determine if gaze indicators could be predictive of ASD. RESULTS: Our results show that gaze fixation patterns differ between the 2 cohorts; specifically, we could identify 1 statistically significant region of fixation (P<.001). In addition, we also demonstrate that there are unique visual scanning patterns that exist for individuals with ASD when compared to NT children (P<.001). A deep learning model trained on coarse gaze fixation annotations demonstrates mild predictive power in identifying ASD. CONCLUSIONS: Ultimately, our study demonstrates that heterogeneous video data sets collected from mobile devices hold potential for quantifying visual patterns and providing insights into ASD. We show the importance of automated labeling techniques in generating large-scale data sets while simultaneously preserving the privacy of participants, and we demonstrate that specific social engagement indicators associated with ASD can be identified and characterized using such data.


Assuntos
Transtorno do Espectro Autista , Aplicativos Móveis , Transtorno do Espectro Autista/diagnóstico , Criança , Computadores de Mão , Fixação Ocular , Humanos , Participação Social
7.
BMC Bioinformatics ; 22(1): 509, 2021 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-34666677

RESUMO

BACKGROUND: Sequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and Microphenoor DiTaxa features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction. RESULTS: On two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR <.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. Porphyromonadaceae, Ruminococcaceae, and an unnamed species of Blastocystis were significantly enriched in autism, while Veillonellaceae was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR<.1). We observed Megasphaera andSutterellaceae highly enriched in obesity, and Phocaeicola significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84. CONCLUSIONS: SBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from http://github.com/briannachrisman/16s_biomarkers .


Assuntos
Microbioma Gastrointestinal , Biomarcadores , Análise por Conglomerados , Microbioma Gastrointestinal/genética , Humanos , RNA Ribossômico 16S/genética , Software
8.
BMC Bioinformatics ; 21(1): 356, 2020 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-32787845

RESUMO

BACKGROUND: Complex human health conditions with etiological heterogeneity like Autism Spectrum Disorder (ASD) often pose a challenge for traditional genome-wide association study approaches in defining a clear genotype to phenotype model. Coalitional game theory (CGT) is an exciting method that can consider the combinatorial effect of groups of variants working in concert to produce a phenotype. CGT has been applied to associate likely-gene-disrupting variants encoded from whole genome sequence data to ASD; however, this previous approach cannot take into account for prior biological knowledge. Here we extend CGT to incorporate a priori knowledge from biological networks through a game theoretic centrality measure based on Shapley value to rank genes by their relevance-the individual gene's synergistic influence in a gene-to-gene interaction network. Game theoretic centrality extends the notion of Shapley value to the evaluation of a gene's contribution to the overall connectivity of its corresponding node in a biological network. RESULTS: We implemented and applied game theoretic centrality to rank genes on whole genomes from 756 multiplex autism families. Top ranking genes with the highest game theoretic centrality in both the weighted and unweighted approaches were enriched for pathways previously associated with autism, including pathways of the immune system. Four of the selected genes HLA-A, HLA-B, HLA-G, and HLA-DRB1-have also been implicated in ASD and further support the link between ASD and the human leukocyte antigen complex. CONCLUSIONS: Game theoretic centrality can prioritize influential, disease-associated genes within biological networks, and assist in the decoding of polygenic associations to complex disorders like autism.


Assuntos
Algoritmos , Teoria dos Jogos , Redes Reguladoras de Genes , Estudos de Associação Genética , Transtorno do Espectro Autista/genética , Estudo de Associação Genômica Ampla , Humanos , Mapeamento de Interação de Proteínas , Reprodutibilidade dos Testes
9.
J Med Internet Res ; 22(4): e13810, 2020 04 22.
Artigo em Inglês | MEDLINE | ID: mdl-32319961

RESUMO

BACKGROUND: Several studies have shown that facial attention differs in children with autism. Measuring eye gaze and emotion recognition in children with autism is challenging, as standard clinical assessments must be delivered in clinical settings by a trained clinician. Wearable technologies may be able to bring eye gaze and emotion recognition into natural social interactions and settings. OBJECTIVE: This study aimed to test: (1) the feasibility of tracking gaze using wearable smart glasses during a facial expression recognition task and (2) the ability of these gaze-tracking data, together with facial expression recognition responses, to distinguish children with autism from neurotypical controls (NCs). METHODS: We compared the eye gaze and emotion recognition patterns of 16 children with autism spectrum disorder (ASD) and 17 children without ASD via wearable smart glasses fitted with a custom eye tracker. Children identified static facial expressions of images presented on a computer screen along with nonsocial distractors while wearing Google Glass and the eye tracker. Faces were presented in three trials, during one of which children received feedback in the form of the correct classification. We employed hybrid human-labeling and computer vision-enabled methods for pupil tracking and world-gaze translation calibration. We analyzed the impact of gaze and emotion recognition features in a prediction task aiming to distinguish children with ASD from NC participants. RESULTS: Gaze and emotion recognition patterns enabled the training of a classifier that distinguished ASD and NC groups. However, it was unable to significantly outperform other classifiers that used only age and gender features, suggesting that further work is necessary to disentangle these effects. CONCLUSIONS: Although wearable smart glasses show promise in identifying subtle differences in gaze tracking and emotion recognition patterns in children with and without ASD, the present form factor and data do not allow for these differences to be reliably exploited by machine learning systems. Resolving these challenges will be an important step toward continuous tracking of the ASD phenotype.


Assuntos
Transtorno do Espectro Autista/terapia , Emoções/fisiologia , Óculos Inteligentes/normas , Dispositivos Eletrônicos Vestíveis/normas , Adolescente , Criança , Feminino , Humanos , Masculino , Fenótipo
10.
J Med Internet Res ; 21(7): e13094, 2019 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-31293243

RESUMO

BACKGROUND: Autism affects 1 in every 59 children in the United States, according to estimates from the Centers for Disease Control and Prevention's Autism and Developmental Disabilities Monitoring Network in 2018. Although similar rates of autism are reported in rural and urban areas, rural families report greater difficulty in accessing resources. An overwhelming number of families experience long waitlists for diagnostic and therapeutic services. OBJECTIVE: The objective of this study was to accurately identify gaps in access to autism care using GapMap, a mobile platform that connects families with local resources while continuously collecting up-to-date autism resource epidemiological information. METHODS: After being extracted from various databases, resources were deduplicated, validated, and allocated into 7 categories based on the keywords identified on the resource website. The average distance between the individuals from a simulated autism population and the nearest autism resource in our database was calculated for each US county. Resource load, an approximation of demand over supply for diagnostic resources, was calculated for each US county. RESULTS: There are approximately 28,000 US resources validated on the GapMap database, each allocated into 1 or more of the 7 categories. States with the greatest distances to autism resources included Alaska, Nevada, Wyoming, Montana, and Arizona. Of the 7 resource categories, diagnostic resources were the most underrepresented, comprising only 8.83% (2472/28,003) of all resources. Alarmingly, 83.86% (2635/3142) of all US counties lacked any diagnostic resources. States with the highest diagnostic resource load included West Virginia, Kentucky, Maine, Mississippi, and New Mexico. CONCLUSIONS: Results from this study demonstrate the sparsity and uneven distribution of diagnostic resources in the United States, which may contribute to the lengthy waitlists and travel distances-barriers to be overcome to be able to receive diagnosis in specific regions. More data are needed on autism diagnosis demand to better quantify resource needs across the United States.


Assuntos
Transtorno Autístico/terapia , Crowdsourcing/métodos , Transtorno Autístico/epidemiologia , Criança , Feminino , Humanos , Masculino , Estados Unidos
12.
J Med Internet Res ; 21(4): e13822, 2019 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-31017583

RESUMO

BACKGROUND: Autism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers to accelerate the process by collecting home videos of US-based children, identifying a reduced subset of behavioral features that are scored by untrained raters using a machine learning classifier to determine children's "risk scores" for autism. We achieved an accuracy of 92% (95% CI 88%-97%) on US videos using a classifier built on five features. OBJECTIVE: Using videos of Bangladeshi children collected from Dhaka Shishu Children's Hospital, we aim to scale our pipeline to another culture and other developmental delays, including speech and language conditions. METHODS: Although our previously published and validated pipeline and set of classifiers perform reasonably well on Bangladeshi videos (75% accuracy, 95% CI 71%-78%), this work improves on that accuracy through the development and application of a powerful new technique for adaptive aggregation of crowdsourced labels. We enhance both the utility and performance of our model by building two classification layers: The first layer distinguishes between typical and atypical behavior, and the second layer distinguishes between ASD and non-ASD. In each of the layers, we use a unique rater weighting scheme to aggregate classification scores from different raters based on their expertise. We also determine Shapley values for the most important features in the classifier to understand how the classifiers' process aligns with clinical intuition. RESULTS: Using these techniques, we achieved an accuracy (area under the curve [AUC]) of 76% (SD 3%) and sensitivity of 76% (SD 4%) for identifying atypical children from among developmentally delayed children, and an accuracy (AUC) of 85% (SD 5%) and sensitivity of 76% (SD 6%) for identifying children with ASD from those predicted to have other developmental delays. CONCLUSIONS: These results show promise for using a mobile video-based and machine learning-directed approach for early and remote detection of autism in Bangladeshi children. This strategy could provide important resources for developmental health in developing countries with few clinical resources for diagnosis, helping children get access to care at an early age. Future research aimed at extending the application of this approach to identify a range of other conditions and determine the population-level burden of developmental disabilities and impairments will be of high value.


Assuntos
Transtorno do Espectro Autista/diagnóstico , Deficiências do Desenvolvimento/diagnóstico , Aprendizado de Máquina/normas , Gravação em Vídeo/métodos , Bangladesh , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Estudos de Validação como Assunto
13.
J Med Internet Res ; 21(5): e13668, 2019 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-31124463

RESUMO

BACKGROUND: Obtaining a diagnosis of neuropsychiatric disorders such as autism requires long waiting times that can exceed a year and can be prohibitively expensive. Crowdsourcing approaches may provide a scalable alternative that can accelerate general access to care and permit underserved populations to obtain an accurate diagnosis. OBJECTIVE: We aimed to perform a series of studies to explore whether paid crowd workers on Amazon Mechanical Turk (AMT) and citizen crowd workers on a public website shared on social media can provide accurate online detection of autism, conducted via crowdsourced ratings of short home video clips. METHODS: Three online studies were performed: (1) a paid crowdsourcing task on AMT (N=54) where crowd workers were asked to classify 10 short video clips of children as "Autism" or "Not autism," (2) a more complex paid crowdsourcing task (N=27) with only those raters who correctly rated ≥8 of the 10 videos during the first study, and (3) a public unpaid study (N=115) identical to the first study. RESULTS: For Study 1, the mean score of the participants who completed all questions was 7.50/10 (SD 1.46). When only analyzing the workers who scored ≥8/10 (n=27/54), there was a weak negative correlation between the time spent rating the videos and the sensitivity (ρ=-0.44, P=.02). For Study 2, the mean score of the participants rating new videos was 6.76/10 (SD 0.59). The average deviation between the crowdsourced answers and gold standard ratings provided by two expert clinical research coordinators was 0.56, with an SD of 0.51 (maximum possible SD is 3). All paid crowd workers who scored 8/10 in Study 1 either expressed enjoyment in performing the task in Study 2 or provided no negative comments. For Study 3, the mean score of the participants who completed all questions was 6.67/10 (SD 1.61). There were weak correlations between age and score (r=0.22, P=.014), age and sensitivity (r=-0.19, P=.04), number of family members with autism and sensitivity (r=-0.195, P=.04), and number of family members with autism and precision (r=-0.203, P=.03). A two-tailed t test between the scores of the paid workers in Study 1 and the unpaid workers in Study 3 showed a significant difference (P<.001). CONCLUSIONS: Many paid crowd workers on AMT enjoyed answering screening questions from videos, suggesting higher intrinsic motivation to make quality assessments. Paid crowdsourcing provides promising screening assessments of pediatric autism with an average deviation <20% from professional gold standard raters, which is potentially a clinically informative estimate for parents. Parents of children with autism likely overfit their intuition to their own affected child. This work provides preliminary demographic data on raters who may have higher ability to recognize and measure features of autism across its wide range of phenotypic manifestations.


Assuntos
Transtorno do Espectro Autista/diagnóstico , Crowdsourcing/métodos , Coleta de Dados/métodos , Testes Diagnósticos de Rotina/métodos , Programas de Rastreamento/métodos , Adulto , Pré-Escolar , Humanos , Internet , Mídias Sociais
14.
PLoS Med ; 15(11): e1002705, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30481180

RESUMO

BACKGROUND: The standard approaches to diagnosing autism spectrum disorder (ASD) evaluate between 20 and 100 behaviors and take several hours to complete. This has in part contributed to long wait times for a diagnosis and subsequent delays in access to therapy. We hypothesize that the use of machine learning analysis on home video can speed the diagnosis without compromising accuracy. We have analyzed item-level records from 2 standard diagnostic instruments to construct machine learning classifiers optimized for sparsity, interpretability, and accuracy. In the present study, we prospectively test whether the features from these optimized models can be extracted by blinded nonexpert raters from 3-minute home videos of children with and without ASD to arrive at a rapid and accurate machine learning autism classification. METHODS AND FINDINGS: We created a mobile web portal for video raters to assess 30 behavioral features (e.g., eye contact, social smile) that are used by 8 independent machine learning models for identifying ASD, each with >94% accuracy in cross-validation testing and subsequent independent validation from previous work. We then collected 116 short home videos of children with autism (mean age = 4 years 10 months, SD = 2 years 3 months) and 46 videos of typically developing children (mean age = 2 years 11 months, SD = 1 year 2 months). Three raters blind to the diagnosis independently measured each of the 30 features from the 8 models, with a median time to completion of 4 minutes. Although several models (consisting of alternating decision trees, support vector machine [SVM], logistic regression (LR), radial kernel, and linear SVM) performed well, a sparse 5-feature LR classifier (LR5) yielded the highest accuracy (area under the curve [AUC]: 92% [95% CI 88%-97%]) across all ages tested. We used a prospectively collected independent validation set of 66 videos (33 ASD and 33 non-ASD) and 3 independent rater measurements to validate the outcome, achieving lower but comparable accuracy (AUC: 89% [95% CI 81%-95%]). Finally, we applied LR to the 162-video-feature matrix to construct an 8-feature model, which achieved 0.93 AUC (95% CI 0.90-0.97) on the held-out test set and 0.86 on the validation set of 66 videos. Validation on children with an existing diagnosis limited the ability to generalize the performance to undiagnosed populations. CONCLUSIONS: These results support the hypothesis that feature tagging of home videos for machine learning classification of autism can yield accurate outcomes in short time frames, using mobile devices. Further work will be needed to confirm that this approach can accelerate autism diagnosis at scale.


Assuntos
Transtorno Autístico/diagnóstico , Diagnóstico por Computador/métodos , Aprendizado de Máquina , Consulta Remota/métodos , Gravação em Vídeo/métodos , Adolescente , Comportamento do Adolescente , Fatores Etários , Transtorno Autístico/fisiopatologia , Transtorno Autístico/psicologia , Criança , Comportamento Infantil , Pré-Escolar , Diagnóstico Precoce , Estudos de Viabilidade , Feminino , Humanos , Lactente , Masculino , Valor Preditivo dos Testes , Estudos Prospectivos , Reprodutibilidade dos Testes , Fatores de Tempo
15.
BMC Bioinformatics ; 18(1): 49, 2017 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-28107819

RESUMO

BACKGROUND: Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. RESULTS: In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. CONCLUSIONS: MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.


Assuntos
Computação em Nuvem , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Bases de Dados Genéticas , Genoma Humano , Humanos , Internet , Software , Fluxo de Trabalho
16.
BMC Genomics ; 18(1): 315, 2017 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-28427329

RESUMO

BACKGROUND: Numerous studies have highlighted the elevated degree of comorbidity associated with autism spectrum disorder (ASD). These comorbid conditions may add further impairments to individuals with autism and are substantially more prevalent compared to neurotypical populations. These high rates of comorbidity are not surprising taking into account the overlap of symptoms that ASD shares with other pathologies. From a research perspective, this suggests common molecular mechanisms involved in these conditions. Therefore, identifying crucial genes in the overlap between ASD and these comorbid disorders may help unravel the common biological processes involved and, ultimately, shed some light in the understanding of autism etiology. RESULTS: In this work, we used a two-fold systems biology approach specially focused on biological processes and gene networks to conduct a comparative analysis of autism with 31 frequently comorbid disorders in order to define a multi-disorder subcomponent of ASD and predict new genes of potential relevance to ASD etiology. We validated our predictions by determining the significance of our candidate genes in high throughput transcriptome expression profiling studies. Using prior knowledge of disease-related biological processes and the interaction networks of the disorders related to autism, we identified a set of 19 genes not previously linked to ASD that were significantly differentially regulated in individuals with autism. In addition, these genes were of potential etiologic relevance to autism, given their enriched roles in neurological processes crucial for optimal brain development and function, learning and memory, cognition and social behavior. CONCLUSIONS: Taken together, our approach represents a novel perspective of autism from the point of view of related comorbid disorders and proposes a model by which prior knowledge of interaction networks may enlighten and focus the genome-wide search for autism candidate genes to better define the genetic heterogeneity of ASD.


Assuntos
Transtorno do Espectro Autista/epidemiologia , Transtorno do Espectro Autista/genética , Comorbidade , Biologia de Sistemas , Transtorno do Espectro Autista/etiologia , Perfilação da Expressão Gênica , Humanos
17.
Am J Epidemiol ; 186(8): 1000-1009, 2017 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-29040395

RESUMO

Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Predisposição Genética para Doença , Genômica/métodos , Humanos
18.
Bioinformatics ; 30(20): 2956-8, 2014 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-24982428

RESUMO

SUMMARY: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. AVAILABILITY AND IMPLEMENTATION: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. CONTACT: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Linguagens de Programação
19.
JMIR Res Protoc ; 13: e52205, 2024 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-38329783

RESUMO

BACKGROUND: A considerable number of minors in the United States are diagnosed with developmental or psychiatric conditions, potentially influenced by underdiagnosis factors such as cost, distance, and clinician availability. Despite the potential of digital phenotyping tools with machine learning (ML) approaches to expedite diagnoses and enhance diagnostic services for pediatric psychiatric conditions, existing methods face limitations because they use a limited set of social features for prediction tasks and focus on a single binary prediction, resulting in uncertain accuracies. OBJECTIVE: This study aims to propose the development of a gamified web system for data collection, followed by a fusion of novel crowdsourcing algorithms with ML behavioral feature extraction approaches to simultaneously predict diagnoses of autism spectrum disorder and attention-deficit/hyperactivity disorder in a precise and specific manner. METHODS: The proposed pipeline will consist of (1) gamified web applications to curate videos of social interactions adaptively based on the needs of the diagnostic system, (2) behavioral feature extraction techniques consisting of automated ML methods and novel crowdsourcing algorithms, and (3) the development of ML models that classify several conditions simultaneously and that adaptively request additional information based on uncertainties about the data. RESULTS: A preliminary version of the web interface has been implemented, and a prior feature selection method has highlighted a core set of behavioral features that can be targeted through the proposed gamified approach. CONCLUSIONS: The prospect for high reward stems from the possibility of creating the first artificial intelligence-powered tool that can identify complex social behaviors well enough to distinguish conditions with nuanced differentiators such as autism spectrum disorder and attention-deficit/hyperactivity disorder. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/52205.

20.
Bioinformatics ; 28(5): 715-6, 2012 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-22247275

RESUMO

UNLABELLED: Roundup is an online database of gene orthologs for over 1800 genomes, including 226 Eukaryota, 1447 Bacteria, 113 Archaea and 21 Viruses. Orthologs are inferred using the Reciprocal Smallest Distance algorithm. Users may query Roundup for single-linkage clusters of orthologous genes based on any group of genomes. Annotated query results may be viewed in a variety of ways including as clusters of orthologs and as phylogenetic profiles. Genomic results may be downloaded in formats suitable for functional as well as phylogenetic analysis, including the recent OrthoXML standard. In addition, gene IDs can be retrieved using FASTA sequence search. All source code and orthologs are freely available. AVAILABILITY: http://roundup.hms.harvard.edu.


Assuntos
Algoritmos , Genômica/métodos , Filogenia , Animais , Archaea/genética , Bactérias/genética , Análise por Conglomerados , Evolução Molecular , Genoma , Humanos , Vírus/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA