Búsqueda | Portal Regional de la BVS

Bring Your Own Location Data: Use of Google Smartphone Location History Data for Environmental Health Research.

Hystad, Perry; Amram, Ofer; Oje, Funso; Larkin, Andrew; Boakye, Kwadwo; Avery, Ally; Gebremedhin, Assefaw; Duncan, Glen.

Environ Health Perspect ; 130(11): 117005, 2022 11.

Artículo en Inglés | MEDLINE | ID: mdl-36356208

RESUMEN

BACKGROUND: Environmental exposures are commonly estimated using spatial methods, with most epidemiological studies relying on home addresses. Passively collected smartphone location data, like Google Location History (GLH) data, may present an opportunity to integrate existing long-term time-activity data. OBJECTIVES: We aimed to evaluate the potential use of GLH data for capturing long-term retrospective time-activity data for environmental health research. METHODS: We included 378 individuals who participated in previous Global Positioning System (GPS) studies within the Washington State Twin Registry. GLH data consists of location information that has been routinely collected since 2010 when location sharing was enabled within android operating systems or Google apps. We created instructions for participants to download their GLH data and provide it through secure data transfer. We summarized the GLH data provided, compared it to available GPS data, and conducted an exposure assessment for nitrogen dioxide (NO2) air pollution. RESULTS: Of 378 individuals contacted, we received GLH data from 61 individuals (16.1%) and 53 (14.0%) indicated interest but did not have historical GLH data available. The provided GLH data spanned 2010-2021 and included 34 million locations, capturing 66,677 participant days. The median number of days with GLH data per participant was 752, capturing 442 unique locations. When we compared GLH data to 2-wk GPS data (â¼1.8 million points), 95% of GPS time-activity points were within 100m of GLH locations. We observed important differences between NO2 exposures assigned at home locations compared with GLH locations, highlighting the importance of GLH data to environmental exposure assessment. DISCUSSION: We believe collecting GLH data is a feasible and cost-effective method for capturing retrospective time-activity patterns for large populations that presents new opportunities for environmental epidemiology. Cohort studies should consider adding GLH data collection to capture historical time-activity patterns of participants, employing a "bring-your-own-location-data" citizen science approach. Privacy remains a concern that needs to be carefully managed when using GLH data. https://doi.org/10.1289/EHP10829.

Asunto(s)

Contaminantes Atmosféricos , Contaminación del Aire , Humanos , Contaminantes Atmosféricos/análisis , Estudios Retrospectivos , Teléfono Inteligente , Motor de Búsqueda , Exposición a Riesgos Ambientales , Salud Ambiental

Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike Protein.

Patil, Shruti S; Catanese, Helen N; Brayton, Kelly A; Lofgren, Eric T; Gebremedhin, Assefaw H.

Viruses ; 14(8)2022 07 29.

Artículo en Inglés | MEDLINE | ID: mdl-36016294

RESUMEN

Severe acute respiratory syndrome-related coronavirus (SARS-CoV-2), which still infects hundreds of thousands of people globally each day despite various countermeasures, has been mutating rapidly. Mutations in the spike (S) protein seem to play a vital role in viral stability, transmission, and adaptability. Therefore, to control the spread of the virus, it is important to gain insight into the evolution and transmission of the S protein. This study deals with the temporal and geographical distribution of mutant S proteins from sequences gathered across the US over a period of 19 months in 2020 and 2021. The S protein sequences are studied using two approaches: (i) multiple sequence alignment is used to identify prominent mutations and highly mutable regions and (ii) sequence similarity networks are subsequently employed to gain further insight and study mutation profiles of concerning variants across the defined time periods and states. Additionally, we tracked the variants using visualizations on geographical maps. The visualizations produced using the Directed Weighted All Nearest Neighbors (DiWANN) networks and maps provided insights into the transmission of the virus that reflect well the statistics reported for the time periods studied. We found that the networks created using DiWANN are superior to commonly used approximate distance networks created using BLAST bitscores. The study offers a richer computational approach to analyze the transmission profile of the prominent S protein mutations in SARS-CoV-2 and can be extended to other proteins and viruses.

Asunto(s)

COVID-19 , Glicoproteína de la Espiga del Coronavirus , Humanos , Mutación , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/metabolismo

Collaborative Multi-Expert Active Learning for Mobile Health Monitoring: Architecture, Algorithms, and Evaluation.

Saeedi, Ramyar; Sasani, Keyvan; Gebremedhin, Assefaw H.

Sensors (Basel) ; 20(7)2020 Mar 30.

Artículo en Inglés | MEDLINE | ID: mdl-32235652

RESUMEN

Mobile health monitoring plays a central role in the future of cyber physical systems (CPS) for healthcare applications. Such monitoring systems need to process user data accurately. Unlike in other human-centered CPS, in healthcare CPS, the user functions in multiple roles all at the same time: as an operator, an actuator, the physical environment and, most importantly, the target that needs to be monitored in the process. Therefore, mobile health CPS devices face highly dynamic settings generally, and accuracy of the machine learning models the devices employ may drop dramatically every time a change in setting happens. Novel learning architecture that specifically address challenges associated with dynamic environments are therefore needed. Using active learning and transfer learning as organizing principles, we propose a collaborative multiple-expert architecture and accompanying algorithms for the design of machine learning models that autonomously adapt to a new configuration, context, or user need. Specifically, our architecture and its constituent algorithms are designed to manage heterogeneous knowledge sources or experts with varying levels of confidence and type while minimizing adaptation cost. Additionally, our framework incorporates a mechanism for collaboration among experts to enrich their knowledge, which in turn decreases both cost and uncertainty of data labeling in future steps. We evaluate the efficacy of the architecture using two publicly available human activity datasets. We attain activity recognition accuracy of over 85 % (for the first dataset) and 92 % (for the second dataset) by labeling only 15 % of unlabeled data.

Asunto(s)

Actividades Humanas , Unidades Móviles de Salud , Monitoreo Fisiológico , Telemedicina/tendencias , Algoritmos , Atención a la Salud , Humanos , Aprendizaje Automático

A nearest-neighbors network model for sequence data reveals new insight into genotype distribution of a pathogen.

Catanese, Helen N; Brayton, Kelly A; Gebremedhin, Assefaw H.

BMC Bioinformatics ; 19(1): 475, 2018 Dec 12.

Artículo en Inglés | MEDLINE | ID: mdl-30541438

RESUMEN

BACKGROUND: Sequence similarity networks are useful for classifying and characterizing biologically important proteins. Threshold-based approaches to similarity network construction using exact distance measures are prohibitively slow to compute and rely on the difficult task of selecting an appropriate threshold, while similarity networks based on approximate distance calculations compromise useful structural information. RESULTS: We present an alternative network representation for a set of sequence data that overcomes these drawbacks. In our model, called the Directed Weighted All Nearest Neighbors (DiWANN) network, each sequence is represented by a node and is connected via a directed edge to only the closest sequence, or sequences in the case of ties, in the dataset. Our contributions span several aspects. Specifically, we: (i) Apply an all nearest neighbors network model to protein sequence data from three different applications and examine the structural properties of the networks; (ii) Compare the model against threshold-based networks to validate their semantic equivalence, and demonstrate the relative advantages the model offers; (iii) Demonstrate the model's resilience to missing sequences; and (iv) Develop an efficient algorithm for constructing a DiWANN network from a set of sequences. We find that the DiWANN network representation attains similar semantic properties to threshold-based graphs, while avoiding weaknesses of both high and low threshold graphs. Additionally, we find that approximate distance networks, using BLAST bitscores in place of exact edit distances, can cause significant loss of structural information. We show that the proposed DiWANN network construction algorithm provides a fourfold speedup over a standard threshold based approach to network construction. We also identify a relationship between the centrality of a sequence in a similarity network of an Anaplasma marginale short sequence repeat dataset and how broadly that sequence is dispersed geographically. CONCLUSION: We demonstrate that using approximate distance measures to rapidly construct similarity networks may lead to significant deficiencies in the structure of that network in terms centrality and clustering analyses. We present a new network representation that maintains the structural semantics of threshold-based networks while increasing connectedness, and an algorithm for constructing the network using exact distance measures in a fraction of the time it would take to build a threshold-based equivalent.

Asunto(s)

Secuencia de Aminoácidos/genética , Proteínas/química , Análisis por Conglomerados , Genotipo , Metaanálisis en Red

Synthetic Sensor Data Generation for Health Applications: A Supervised Deep Learning Approach.

Norgaard, Skyler; Saeedi, Ramyar; Sasani, Keyvan; Gebremedhin, Assefaw H.

Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 1164-1167, 2018 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-30440598

RESUMEN

Recent advancements in mobile devices, data analysis, and wearable sensors render the capability of in-place health monitoring. Supervised machine learning algorithms, the core intelligence of these systems, learn from labeled training data. However, labeling vast amount of data is time-consuming and expensive. Moreover, sensor data often contains personal information that a user may not be comfortable sharing. Therefore, there is a strong need to develop methods for generating realistic labeled sensor data. In this paper, we propose a supervised generative adversarial network architecture that learns from feedback from both a discriminator and a classifier in order to create synthetic sensor data. We demonstrate the effectiveness of the architecture on a publicly available human activity dataset. We show that our generator learns to output diverse samples that are similar but not identical to the training data.

Asunto(s)

Algoritmos , Aprendizaje Profundo , Actividades Humanas , Humanos , Aprendizaje Automático Supervisado

Personalized Human Activity Recognition using Wearables: A Manifold Learning-based Knowledge Transfer.

Saeedi, Ramyar; Sasani, Keyvan; Norgaard, Skyler; Gebremedhin, Assefaw H.

Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 1193-1196, 2018 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-30440604

RESUMEN

Human activity recognition (HAR) is an important component in health-care systems. For example, it can enable context-aware applications such as elderly care and patient monitoring. Relying on a set of training data, supervised machine learning algorithms form the core intelligence of most existing HAR systems. Meanwhile, the accuracy of an HAR model highly depends on the similarity between the training and the operating context. Therefore, there is a need for developing machine learning algorithms that can easily adapt to the operating context at hand. In this paper, we propose a cross-subject transfer learning algorithm that links source and target subjects by constructing manifolds from feature-level representation of the source subject(s). Our algorithm assigns labels to the unlabeled data in the current context using the manifold learned from the source subject(s). The newly labeled data is used to develop a personalized HAR model for the current context (i.e., target subject). We demonstrate the efficacy of the algorithm using a publicly available dataset on HAR. We show that the proposed framework improves the accuracy of activity recognition by up to 24%.

Asunto(s)

Algoritmos , Actividades Humanas , Bases del Conocimiento , Aprendizaje Automático Supervisado , Dispositivos Electrónicos Vestibles , Humanos

Characterization of Anaplasma marginale subsp. centrale Strains by Use of msp1aS Genotyping Reveals a Wildlife Reservoir.

Khumalo, Zamantungwa T H; Catanese, Helen N; Liesching, Nicole; Hove, Paidashe; Collins, Nicola E; Chaisi, Mamohale E; Gebremedhin, Assefaw H; Oosthuizen, Marinda C; Brayton, Kelly A.

J Clin Microbiol ; 54(10): 2503-12, 2016 10.

Artículo en Inglés | MEDLINE | ID: mdl-27440819

RESUMEN

Bovine anaplasmosis caused by the intraerythrocytic rickettsial pathogen Anaplasma marginale is endemic in South Africa. Anaplasma marginale subspecies centrale also infects cattle; however, it causes a milder form of anaplasmosis and is used as a live vaccine against A. marginale There has been less interest in the epidemiology of A. marginale subsp. centrale, and, as a result, there are few reports detecting natural infections of this organism. When detected in cattle, it is often assumed that it is due to vaccination, and in most cases, it is reported as coinfection with A. marginale without characterization of the strain. A total of 380 blood samples from wild ruminant species and cattle collected from biobanks, national parks, and other regions of South Africa were used in duplex real-time PCR assays to simultaneously detect A. marginale and A. marginale subsp. centrale. PCR results indicated high occurrence of A. marginale subsp. centrale infections, ranging from 25 to 100% in national parks. Samples positive for A. marginale subsp. centrale were further characterized using the msp1aS gene, a homolog of msp1α of A. marginale, which contains repeats at the 5' ends that are useful for genotyping strains. A total of 47 Msp1aS repeats were identified, which corresponded to 32 A. marginale subsp. centrale genotypes detected in cattle, buffalo, and wildebeest. RepeatAnalyzer was used to examine strain diversity. Our results demonstrate a diversity of A. marginale subsp. centrale strains from cattle and wildlife hosts from South Africa and indicate the utility of msp1aS as a genotypic marker for A. marginale subsp. centrale strain diversity.

Asunto(s)

Anaplasma marginale/clasificación , Anaplasma marginale/aislamiento & purificación , Anaplasmosis/epidemiología , Anaplasmosis/microbiología , Animales Salvajes , Variación Genética , Técnicas de Genotipaje/métodos , África , Anaplasma marginale/genética , Animales , Bovinos , Enfermedades de los Bovinos/epidemiología , Enfermedades de los Bovinos/microbiología , Genes Bacterianos , Reacción en Cadena de la Polimerasa Multiplex , Prevalencia , Reacción en Cadena en Tiempo Real de la Polimerasa , Sudáfrica/epidemiología

RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data.

Catanese, Helen N; Brayton, Kelly A; Gebremedhin, Assefaw H.

BMC Genomics ; 17: 422, 2016 06 03.

Artículo en Inglés | MEDLINE | ID: mdl-27260942

RESUMEN

BACKGROUND: Short-sequence repeats (SSRs) occur in both prokaryotic and eukaryotic DNA, inter- and intragenically, and may be exact or inexact copies. When heterogeneous SSRs are present in a given locus, we can take advantage of the pattern of different repeats to genotype strains based on the SSRs. Cataloguing and tracking these repeats can be difficult as diverse groups of researchers are involved in the identification of the repeats. Additionally, the task is error-prone when done manually. RESULTS: We developed RepeatAnalyzer, a new software tool capable of tracking, managing, analysing and cataloguing SSRs and genotypes using Anaplasma marginale as a model species. RepeatAnalyzer's analysis capability includes novel metrics for measuring regional genetic diversity (corresponding to variety and regularity of SSR occurrence). As a part of its visualization capabilities, RepeatAnalyzer produces high quality maps of the geographic distribution of genotypes or SSRs over a region of interest. RepeatAnalyzer's repeat identification functionality was validated for all SSRs and genotypes reported in 21 publications, using 380 A. marginale isolates gathered from the five publications within that list that provided access to their isolates. The tool produced accurate genotyping results in every case. In addition, it uncovered a number of errors in the published literature: 11 cases where SSRs were misreported, 5 cases where two different SSRs had been given the same name, and 16 cases where two or more names had been given to a single SSR. The analysis and visualization functionalities of the tool are demonstrated using several examples. CONCLUSIONS: RepeatAnalyzer is a robust software tool that can be used for storing, managing, and analysing short-sequence repeats for the purpose of strain identification. The tool can be used for any set of SSRs regardless of species. When applied to A. marginale, our test case, we show that genotype lengths for a given region follow a normal distribution, while SSR frequencies follow a power-law-like distribution. Further, we find that over 90 % of repeats are 28 to 29 amino acids long, which is in agreement with conventional wisdom. Lastly, our analysis reveals that the most common edit distance is five or six, which is counter-intuitive since we expected that result to be closer to one, resulting from the simplest change from one repeat to another.

Asunto(s)

Biología Computacional/métodos , Genómica/métodos , Repeticiones de Microsatélite , Programas Informáticos , Anaplasma marginale/genética , Variación Genética , Genotipo , Reproducibilidad de los Resultados , Streptococcus pneumoniae/genética

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA