Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Genome Res ; 33(10): 1734-1746, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37879860

RESUMEN

Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and ∼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.


Asunto(s)
Genoma Humano , Genómica , Humanos , Algoritmos , Telómero/genética , Variación Genética , Análisis de Secuencia de ADN
2.
Genome Res ; 33(10): 1747-1756, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37879861

RESUMEN

Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.


Asunto(s)
Genoma , Patrón de Herencia , Humanos , Secuenciación Completa del Genoma , Haplotipos
3.
JAMA Netw Open ; 6(1): e2251182, 2023 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-36689227

RESUMEN

Importance: While research has identified racial and ethnic disparities in access to autism services, the size, extent, and specific locations of these access gaps have not yet been characterized on a national scale. Mapping comprehensive national listings of autism health care services together with the prevalence of autistic children of various races and ethnicities and evaluating geographic regions defined by localized commuting patterns may help to identify areas within the US where families who belong to minoritized racial and ethnic groups have disproportionally lower access to services. Objective: To evaluate differences in access to autism health care services among autistic children of various races and ethnicities within precisely defined geographic regions encompassing all serviceable areas within the US. Design, Setting, and Participants: This population-based cross-sectional study was conducted from October 5, 2021, to June 3, 2022, and involved 530 965 autistic children in kindergarten through grade 12. Core-based statistical areas (CBSAs; defined as areas containing a city and its surrounding commuter region), the Civil Rights Data Collection (CRDC) data set, and 51 071 autism resources (collected from October 1, 2015, to December 18, 2022) geographically distributed into 912 CBSAs were combined and analyzed to understand variation in access to autism health care services among autistic children of different races and ethnicities. Six racial and ethnic categories (American Indian or Alaska Native, Asian, Black or African American, Hispanic or Latino, Native Hawaiian or other Pacific Islander, and White) assigned by the US Department of Education were included in the analysis. Main Outcomes and Measures: A regularized least-squares regression analysis was used to measure differences in nationwide resource allocation between racial and ethnic groups. The number of autism resources allocated per autistic child was estimated based on the child's racial and ethnic group. To evaluate how the CBSA population size may have altered the results, the least-squares regression analysis was run on CBSAs divided into metropolitan (>50 000 inhabitants) and micropolitan (10 000-50 000 inhabitants) groups. A Mann-Whitney U test was used to compare the model estimated ratio of autism resources to autistic children among specific racial and ethnic groups comprising the proportions of autistic children in each CBSA. Results: Among 530 965 autistic children aged 5 to 18 years, 83.9% were male and 16.1% were female; 0.7% of children were American Indian or Alaska Native, 5.9% were Asian, 14.3% were Black or African American, 22.9% were Hispanic or Latino, 0.2% were Native Hawaiian or other Pacific Islander, 51.7% were White, and 4.2% were of 2 or more races and/or ethnicities. At a national scale, American Indian or Alaska Native autistic children (ß = 0; 95% CI, 0-0; P = .01) and Hispanic autistic children (ß = 0.02; 95% CI, 0-0.06; P = .02) had significant disparities in access to autism resources in comparison with White autistic children. When evaluating the proportion of autistic children in each racial and ethnic group, areas in which Black autistic children (>50% of the population: ß = 0.05; <50% of the population: ß = 0.07; P = .002) or Hispanic autistic children (>50% of the population: ß = 0.04; <50% of the population: ß = 0.07; P < .001) comprised greater than 50% of the total population of autistic children had significantly fewer resources than areas in which Black or Hispanic autistic children comprised less than 50% of the total population. Comparing metropolitan vs micropolitan CBSAs revealed that in micropolitan CBSAs, Black autistic children (ß = 0; 95% CI, 0-0; P < .001) and Hispanic autistic children (ß = 0; 95% CI, 0-0.02; P < .001) had the greatest disparities in access to autism resources compared with White autistic children. In metropolitan CBSAs, American Indian or Alaska Native autistic children (ß = 0; 95% CI, 0-0; P = .005) and Hispanic autistic children (ß = 0.01; 95% CI, 0-0.06; P = .02) had the greatest disparities compared with White autistic children. Conclusions and Relevance: In this study, autistic children from several minoritized racial and ethnic groups, including Black and Hispanic autistic children, had access to significantly fewer autism resources than White autistic children in the US. This study pinpointed the specific geographic regions with the greatest disparities, where increases in the number and types of treatment options are warranted. These findings suggest that a prioritized response strategy to address these racial and ethnic disparities is needed.


Asunto(s)
Trastorno Autístico , Niño , Humanos , Masculino , Femenino , Estudios Transversales , Accesibilidad a los Servicios de Salud , Disparidades en Atención de Salud , Grupos Raciales
4.
Virol J ; 19(1): 225, 2022 12 24.
Artículo en Inglés | MEDLINE | ID: mdl-36566197

RESUMEN

While hundreds of thousands of human whole genome sequences (WGS) have been collected in the effort to better understand genetic determinants of disease, these whole genome sequences have less frequently been used to study another major determinant of human health: the human virome. Using the unmapped reads from WGS of over 1000 families, we present insights into the human blood DNA virome, focusing particularly on human herpesvirus (HHV) 6A, 6B, and 7. In addition to extensively cataloguing the viruses detected in WGS of human whole blood and lymphoblastoid cell lines, we use the family structure of our dataset to show that household drives transmission of several viruses, and identify the Mendelian inheritance patterns characteristic of inherited chromsomally integrated human herpesvirus 6 (iciHHV-6). Consistent with prior studies, we find that 0.6% of our dataset's population has iciHHV, and we locate candidate integration sequences for these cases. We document genetic diversity within exogenous and integrated HHV species and within integration sites of HHV-6. Finally, in the first observation of its kind, we present evidence that suggests widespread de novo HHV-6B integration and HHV-7 integration and reactivation in lymphoblastoid cell lines. These findings show that the unmapped read space of WGS is a promising source of data for virology research.


Asunto(s)
Herpesvirus Humano 6 , Infecciones por Roseolovirus , Humanos , Herpesvirus Humano 6/genética , Integración Viral , Análisis de Secuencia , Línea Celular
5.
AMIA Jt Summits Transl Sci Proc ; 2022: 456-465, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35854759

RESUMEN

Autism is among the most common neurodevelopmental conditions. Timely diagnosis and access to therapeutic resources are essential for positive prognoses, yet long queues and unevenly dispersed resources leave many untreated. Without granular estimates of autism prevalence by geographic area, it is difficult to identify unmet needs and mechanisms to address them. Mining a dataset of 53M children using meaningful geographic regions, we computed autism prevalence across the country. We then performed comparative analysis against 50,000 resources to identify the type and extent of gaps in access to autism services. We find a steady increase in autism diagnoses from K-5, supporting delayed diagnosis of autism, and consistent under-diagnosis of females. We find a significant inverse relationship between prevalence and availability of resources (p < 0.001). While more work is needed to characterize additional trends including racial and ethnicity-based disparities, the identification of resource gaps can direct and prioritize new innovations.

6.
Sci Rep ; 12(1): 9863, 2022 06 14.
Artículo en Inglés | MEDLINE | ID: mdl-35701436

RESUMEN

The unmapped readspace of whole genome sequencing data tends to be large but is often ignored. We posit that it contains valuable signals of both human infection and contamination. Using unmapped and poorly aligned reads from whole genome sequences (WGS) of over 1000 families and nearly 5000 individuals, we present insights into common viral, bacterial, and computational contamination that plague whole genome sequencing studies. We present several notable results: (1) In addition to known contaminants such as Epstein-Barr virus and phiX, sequences from whole blood and lymphocyte cell lines contain many other contaminants, likely originating from storage, prep, and sequencing pipelines. (2) Sequencing plate and biological sample source of a sample strongly influence contamination profile. And, (3) Y-chromosome fragments not on the human reference genome commonly mismap to bacterial reference genomes. Both experiment-derived and computational contamination is prominent in next-generation sequencing data. Such contamination can compromise results from WGS as well as metagenomics studies, and standard protocols for identifying and removing contamination should be developed to ensure the fidelity of sequencing-based studies.


Asunto(s)
Bacteriófagos , Infecciones por Virus de Epstein-Barr , Biología Computacional , Genoma Bacteriano , Genoma Humano , Genoma Viral , Herpesvirus Humano 4/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Secuenciación Completa del Genoma
7.
JMIR Public Health Surveill ; 8(7): e31306, 2022 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-35605128

RESUMEN

BACKGROUND: Selection bias and unmeasured confounding are fundamental problems in epidemiology that threaten study internal and external validity. These phenomena are particularly dangerous in internet-based public health surveillance, where traditional mitigation and adjustment methods are inapplicable, unavailable, or out of date. Recent theoretical advances in causal modeling can mitigate these threats, but these innovations have not been widely deployed in the epidemiological community. OBJECTIVE: The purpose of our paper is to demonstrate the practical utility of causal modeling to both detect unmeasured confounding and selection bias and guide model selection to minimize bias. We implemented this approach in an applied epidemiological study of the COVID-19 cumulative infection rate in the New York City (NYC) spring 2020 epidemic. METHODS: We collected primary data from Qualtrics surveys of Amazon Mechanical Turk (MTurk) crowd workers residing in New Jersey and New York State across 2 sampling periods: April 11-14 and May 8-11, 2020. The surveys queried the subjects on household health status and demographic characteristics. We constructed a set of possible causal models of household infection and survey selection mechanisms and ranked them by compatibility with the collected survey data. The most compatible causal model was then used to estimate the cumulative infection rate in each survey period. RESULTS: There were 527 and 513 responses collected for the 2 periods, respectively. Response demographics were highly skewed toward a younger age in both survey periods. Despite the extremely strong relationship between age and COVID-19 symptoms, we recovered minimally biased estimates of the cumulative infection rate using only primary data and the most compatible causal model, with a relative bias of +3.8% and -1.9% from the reported cumulative infection rate for the first and second survey periods, respectively. CONCLUSIONS: We successfully recovered accurate estimates of the cumulative infection rate from an internet-based crowdsourced sample despite considerable selection bias and unmeasured confounding in the primary data. This implementation demonstrates how simple applications of structural causal modeling can be effectively used to determine falsifiable model conditions, detect selection bias and confounding factors, and minimize estimate bias through model selection in a novel epidemiological context. As the disease and social dynamics of COVID-19 continue to evolve, public health surveillance protocols must continue to adapt; the emergence of Omicron variants and shift to at-home testing as recent challenges. Rigorous and transparent methods to develop, deploy, and diagnosis adapted surveillance protocols will be critical to their success.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Factores de Confusión Epidemiológicos , Humanos , Internet , Ciudad de Nueva York/epidemiología , SARS-CoV-2 , Sesgo de Selección
8.
Artículo en Inglés | MEDLINE | ID: mdl-35634270

RESUMEN

Artificial Intelligence (A.I.) solutions are increasingly considered for telemedicine. For these methods to serve children and their families in home settings, it is crucial to ensure the privacy of the child and parent or caregiver. To address this challenge, we explore the potential for global image transformations to provide privacy while preserving the quality of behavioral annotations. Crowd workers have previously been shown to reliably annotate behavioral features in unstructured home videos, allowing machine learning classifiers to detect autism using the annotations as input. We evaluate this method with videos altered via pixelation, dense optical flow, and Gaussian blurring. On a balanced test set of 30 videos of children with autism and 30 neurotypical controls, we find that the visual privacy alterations do not drastically alter any individual behavioral annotation at the item level. The AUROC on the evaluation set was 90.0% ±7.5% for unaltered videos, 85.0% ±9.0% for pixelation, 85.0% ±9.0% for optical flow, and 83.3% ±9.3% for blurring, demonstrating that an aggregation of small changes across behavioral questions can collectively result in increased misdiagnosis rates. We also compare crowd answers against clinicians who provided the same annotations for the same videos as crowd workers, and we find that clinicians have higher sensitivity in their recognition of autism-related symptoms. We also find that there is a linear correlation (r = 0.75, p < 0.0001) between the mean Clinical Global Impression (CGI) score provided by professional clinicians and the corresponding score emitted by a previously validated autism classifier with crowd inputs, indicating that the classifier's output probability is a reliable estimate of the clinical impression of autism. A significant correlation is maintained with privacy alterations, indicating that crowd annotations can approximate clinician-provided autism impression from home videos in a privacy-preserved manner.

9.
JMIR Pediatr Parent ; 5(2): e26760, 2022 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-35394438

RESUMEN

BACKGROUND: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. OBJECTIVE: We designed a strategy to gamify the collection and labeling of child emotion-enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches. METHODS: We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion-centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children. RESULTS: The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining "anger" and "disgust" into a single class. CONCLUSIONS: This work validates that mobile games designed for pediatric therapies can generate high volumes of domain-relevant data sets to train state-of-the-art classifiers to perform tasks helpful to precision health efforts.

10.
J Med Internet Res ; 24(2): e31830, 2022 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-35166683

RESUMEN

BACKGROUND: Autism spectrum disorder (ASD) is a widespread neurodevelopmental condition with a range of potential causes and symptoms. Standard diagnostic mechanisms for ASD, which involve lengthy parent questionnaires and clinical observation, often result in long waiting times for results. Recent advances in computer vision and mobile technology hold potential for speeding up the diagnostic process by enabling computational analysis of behavioral and social impairments from home videos. Such techniques can improve objectivity and contribute quantitatively to the diagnostic process. OBJECTIVE: In this work, we evaluate whether home videos collected from a game-based mobile app can be used to provide diagnostic insights into ASD. To the best of our knowledge, this is the first study attempting to identify potential social indicators of ASD from mobile phone videos without the use of eye-tracking hardware, manual annotations, and structured scenarios or clinical environments. METHODS: Here, we used a mobile health app to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment. We used automated data set annotations to analyze two social indicators that have previously been shown to differ between children with ASD and their neurotypical (NT) peers: (1) gaze fixation patterns, which represent regions of an individual's visual focus and (2) visual scanning methods, which refer to the ways in which individuals scan their surrounding environment. We compared the gaze fixation and visual scanning methods used by children during a 90-second gameplay video to identify statistically significant differences between the 2 cohorts; we then trained a long short-term memory (LSTM) neural network to determine if gaze indicators could be predictive of ASD. RESULTS: Our results show that gaze fixation patterns differ between the 2 cohorts; specifically, we could identify 1 statistically significant region of fixation (P<.001). In addition, we also demonstrate that there are unique visual scanning patterns that exist for individuals with ASD when compared to NT children (P<.001). A deep learning model trained on coarse gaze fixation annotations demonstrates mild predictive power in identifying ASD. CONCLUSIONS: Ultimately, our study demonstrates that heterogeneous video data sets collected from mobile devices hold potential for quantifying visual patterns and providing insights into ASD. We show the importance of automated labeling techniques in generating large-scale data sets while simultaneously preserving the privacy of participants, and we demonstrate that specific social engagement indicators associated with ASD can be identified and characterized using such data.


Asunto(s)
Trastorno del Espectro Autista , Aplicaciones Móviles , Trastorno del Espectro Autista/diagnóstico , Niño , Computadoras de Mano , Fijación Ocular , Humanos , Participación Social
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA