Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.030
Filter
1.
Stud Health Technol Inform ; 313: 124-128, 2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38682516

ABSTRACT

BACKGROUND: Electronic health records (EHR) emerged as a digital record of the data that is generated in the healthcare. OBJECTIVES: In this paper the transfer times of EHRs using the Hypertext Transfer Protocol and WebSocket in both local network and wide area network (WAN) are compared. METHODS: A python web application to serve Fast Health Interoperability Resources (FHIR) records is created and the transfer times of the EHRs over both HTTP and WebSocket connection are measured. 45000 test Patient resources in 20, 50, 100 and 200 resources per Bundle transfers are used. RESULTS: WebSocket showed much better transfer times of large amount of data. These were 18 s shorter in the local network and 342 s shorter in WAN for the 20 resource per Bundle transfer. CONCLUSION: RESTful APIs are a convenient way to implement EHR servers; on the other hand, HTTP becomes a bottleneck when transferring large amount of data. WebSocket shows better transfer times and thus its superiority in such situations. The problem can be addressed by developing a new communication protocol or by using network tunneling to handle large data transfer of EHRs.


Subject(s)
Electronic Health Records , Humans , Medical Record Linkage/methods , Internet , Health Information Interoperability , Software
2.
J Korean Med Sci ; 39(14): e127, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38622936

ABSTRACT

BACKGROUND: To overcome the limitations of relying on data from a single institution, many researchers have studied data linkage methodologies. Data linkage includes errors owing to legal issues surrounding personal information and technical issues related to data processing. Linkage errors affect selection bias, and external and internal validity. Therefore, quality verification for each connection method with adherence to personal information protection is an important issue. This study evaluated the linkage quality of linked data and analyzed the potential bias resulting from linkage errors. METHODS: This study analyzed claims data submitted to the Health Insurance Review and Assessment Service (HIRA DATA). The linkage errors of the two deterministic linkage methods were evaluated based on the use of the match key. The first deterministic linkage uses a unique identification number, and the second deterministic linkage uses the name, gender, and date of birth as a set of partial identifiers. The linkage error included in this deterministic linkage method was compared with the absolute standardized difference (ASD) of Cohen's according to the baseline characteristics, and the linkage quality was evaluated through the following indicators: linked rate, false match rate, missed match rate, positive predictive value, sensitivity, specificity, and F1-score. RESULTS: For the deterministic linkage method that used the name, gender, and date of birth as a set of partial identifiers, the true match rate was 83.5 and the missed match rate was 16.5. Although there was bias in some characteristics of the data, most of the ASD values were less than 0.1, with no case greater than 0.5. Therefore, it is difficult to determine whether linked data constructed with deterministic linkages have substantial differences. CONCLUSION: This study confirms the possibility of building health and medical data at the national level as the first data linkage quality verification study using big data from the HIRA. Analyzing the quality of linkages is crucial for comprehending linkage errors and generating reliable analytical outcomes. Linkers should increase the reliability of linked data by providing linkage error-related information to researchers. The results of this study will serve as reference data to increase the reliability of multicenter data linkage studies.


Subject(s)
Information Storage and Retrieval , Medical Record Linkage , Humans , Reproducibility of Results , Medical Record Linkage/methods , Predictive Value of Tests , Health Services
3.
Int J Med Inform ; 185: 105387, 2024 May.
Article in English | MEDLINE | ID: mdl-38428200

ABSTRACT

BACKGROUND: Cancer registries link a large number of electronic health records reported by medical institutions to already registered records of the matching individual and tumor. Records are automatically linked using deterministic and probabilistic approaches; machine learning is rarely used. Records that cannot be matched automatically with sufficient accuracy are typically processed manually. For application, it is important to know how well record linkage approaches match real-world records and how much manual effort is required to achieve the desired linkage quality. We study the task of linking reported records to the matching registered tumor in cancer registries. METHODS: We compare the tradeoff between linkage quality and manual effort of five machine learning methods (logistic regression, random forest, gradient boosting, neural network, and a stacked method) to a deterministic baseline. The record linkage methods are compared in a two-class setting (no-match/ match) and a three-class setting (no-match/ undecided/ match). A cancer registry collected and linked the dataset consisting of categorical variables matching 145,755 reported records with 33,289 registered tumors. RESULTS: In the two-class setting, the gradient boosting, neural network, and stacked models have higher accuracy and F1 score (accuracy: 0.968-0.978, F1 score: 0.983-0.988) than the deterministic baseline (accuracy: 0.964, F1 score: 0.980) when the same records are manually processed (0.89% of all records). In the three-class setting, these three machine learning methods can automatically process all reported records and still have higher accuracy and F1 score than the deterministic baseline. The linkage quality of the machine learning methods studied, except for the neural network, increase as the number of manually processed records increases. CONCLUSION: Machine learning methods can significantly improve linkage quality and reduce the manual effort required by medical coders to match tumor records in cancer registries compared to a deterministic baseline. Our results help cancer registries estimate how linkage quality increases as more records are manually processed.


Subject(s)
Electronic Health Records , Neoplasms , Humans , Medical Record Linkage/methods , Neoplasms/epidemiology , Registries , Databases, Factual
4.
Int J Popul Data Sci ; 9(1): 2137, 2024.
Article in English | MEDLINE | ID: mdl-38425790

ABSTRACT

Introduction: Recent years have seen an increase in linkages between survey and administrative data. It is important to evaluate the quality of such data linkages to discern the likely reliability of ensuing research. Evaluation of linkage quality and bias can be conducted using different approaches, but many of these are not possible when there is a separation of processes for linkage and analysis to help preserve privacy, as is typically the case in the UK (and elsewhere). Objectives: We aimed to describe a suite of generalisable methods to evaluate linkage quality and population representativeness of linked survey and administrative data which remain tractable when users of the linked data are not party to the linkage process itself. We emphasise issues particular to longitudinal survey data throughout. Methods: Our proposed approaches cover several areas: i) Linkage rates, ii) Selection into response, linkage consent and successful linkage, iii) Linkage quality, and iv) Linked data population representativeness. We illustrate these methods using a recent linkage between the 1958 National Child Development Study (NCDS; a cohort following an initial 17,415 people born in Great Britain in a single week of 1958) and Hospital Episode Statistics (HES) databases (containing important information regarding admissions, accident and emergency attendances and outpatient appointments at NHS hospitals in England). Results: Our illustrative analyses suggest that the linkage quality of the NCDS-HES data is high and that the linked sample maintains an excellent level of population representativeness with respect to the single dimension we assessed. Conclusions: Through this work we hope to encourage providers and users of linked data resources to undertake and publish thorough evaluations. We further hope that providing illustrative analyses using linked NCDS-HES data will improve the quality and transparency of research using this particular linked data resource.


Subject(s)
Child Development , Medical Record Linkage , Child , Humans , Reproducibility of Results , Medical Record Linkage/methods , Hospitalization , Hospitals
5.
Aust Health Rev ; 48(1): 8-15, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38118279

ABSTRACT

Objective Data linkage is a very powerful research tool in epidemiology, however, establishing this can be a lengthy and intensive process. This paper reports on the complex landscape of conducting data linkage projects in Australia. Methods We reviewed the processes, required documentation, and applications required to conduct multi-jurisdictional data linkage across Australia, in 2023. Results Obtaining the necessary approvals to conduct linkage will likely take nearly 2 years (estimated 730 days, including 605 days from initial submission to obtaining all ethical approvals and an estimated further 125 days for the issuance of unexpected additionally required approvals). Ethical review for linkage projects ranged from 51 to 128 days from submission to ethical approval, and applications consisted of 9-25 documents. Conclusions Major obstacles to conducting multi-jurisdictional data linkage included the complexity of the process, and substantial time and financial costs. The process was characterised by inefficiencies at several levels, reduplication, and a lack of any key accountabilities for timely performance of processes. Data linkage is an invaluable resource for epidemiological research. Further streamlining, establishing accountability, and greater collaboration between jurisdictions is needed to ensure data linkage is both accessible and feasible to researchers.


Subject(s)
Heart Defects, Congenital , Medical Record Linkage , Humans , Medical Record Linkage/methods , Registries , Australia/epidemiology , Information Storage and Retrieval , Heart Defects, Congenital/epidemiology
6.
PLoS One ; 18(10): e0291581, 2023.
Article in English | MEDLINE | ID: mdl-37862306

ABSTRACT

Research with administrative records involves the challenge of limited information in any single data source to answer policy-related questions. Record linkage provides researchers with a tool to supplement administrative datasets with other information about the same people when identified in separate sources as matched pairs. Several solutions are available for undertaking record linkage, producing linkage keys for merging data sources for positively matched pairs of records. In the current manuscript, we demonstrate a new application of the Python RecordLinkage package to family-based record linkages with machine learning algorithms for probability scoring, which we call probabilistic record linkage for families (PRLF). First, a simulation of administrative records identifies PRLF accuracy with variations in match and data degradation percentages. Accuracy is largely influenced by degradation (e.g., missing data fields, mismatched values) compared to the percentage of simulated matches. Second, an application of data linkage is presented to compare regression model estimate performance across three record linkage solutions (PRLF, ChoiceMaker, and Link Plus). Our findings indicate that all three solutions, when optimized, provide similar results for researchers. Strengths of our process, such as the use of ensemble methods, to improve match accuracy are discussed. We then identify caveats of record linkage in the context of administrative data.


Subject(s)
Algorithms , Medical Record Linkage , Humans , Medical Record Linkage/methods , Computer Simulation , Probability , Information Storage and Retrieval
7.
Stat Med ; 42(27): 4931-4951, 2023 Nov 30.
Article in English | MEDLINE | ID: mdl-37652076

ABSTRACT

In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that the proposed method can improve the linking process, and can result in accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare enrollment records.


Subject(s)
Medical Record Linkage , Medicare , Aged , Humans , United States , Bayes Theorem , Medical Record Linkage/methods , Algorithms
8.
Gesundheitswesen ; 85(S 02): S154-S161, 2023 Mar.
Article in German | MEDLINE | ID: mdl-36940697

ABSTRACT

BACKGROUND: The aim of the project "Effectiveness of care in oncological centres" (WiZen), funded by the innovation fund of the federal joint committee, is to investigate the effectiveness of certification in oncology. The project uses nationwide data from the statuory health insurance AOK and data from clinical cancer registries from three different federal states from 2006-2017. To combine the strengths of both data sources, these will be linked for eight different cancer entities in compliance with data protection regulations. METHODS: Data linkage was performed using indirect identifiers and validated using the health insurance's patient ID ("Krankenversichertennummer") as a direct identifier and gold standard. This enables quantification of the quality of different linkage variants. Sensitivity and specificity as well as hit accuracy and a score addressing the quality of the linkage were used as evaluation criteria. The distributions of relevant variables resulting from the linkage were validated against the original distributions in the individual datasets. RESULTS: Depending on the combination of indirect identifiers, we found a range of 22,125 to 3,092,401 linkage hits. An almost perfect linkage could be achieved by combining information on cancer type, date of birth, gender and postal code. A total of 74,586 one-to-one linkages were achieved with these characteristics. The median hit quality for the different entities was more than 98%. In addition, both the age and sex distributions and the dates of death, if any, showed a high degree of agreement. DISCUSSION AND CONCLUSION: SHI and cancer registry data can be linked with high internal and external validity at the individual level. This robust linkage enables completely new possibilities for analysis through simultaneous access to variables from both data sets ("the best of both worlds"): Information on the UICC stage that stems from the registries can now be combined, for instance, with comorbidities from the SHI data at the individual level. Due to the use of readily available variables and the high success of the linkage, our procedure constitutes a promising method for future linkage processes in health care research.


Subject(s)
Neoplasms , Routinely Collected Health Data , Humans , Germany/epidemiology , Registries , Information Storage and Retrieval , Insurance, Health , Neoplasms/epidemiology , Medical Record Linkage/methods
9.
Community Dent Oral Epidemiol ; 51(1): 75-78, 2023 02.
Article in English | MEDLINE | ID: mdl-36749677

ABSTRACT

OBJECTIVES: Poor oral health, impacting health and wellbeing across the life-course, is a costly and wicked problem. Data (or record) linkage is the linking of different sets of data (often administrative data gathered for non-research purposes) that are matched to an individual and may include records such as medical data, housing information and sociodemographic information. It often uses population-level data or 'big data'. Data linkage provides the opportunity to analyse complex associations from different sources for total populations. The aim of the paper is to explore data linkage, how it is important for oral health research and what promise it holds for the future. METHODS: This is a narrative review of an approach (data linkage) in oral health research. RESULTS: Data linkage may be a powerful method for bringing together various population datasets. It has been used to explore a wide variety of topics with many varied datasets. It has substantial current and potential application in oral health research. CONCLUSIONS: Use of population data linkage is increasing in oral health research where the approach has been very useful in exploring the complexity of oral health. It offers promise for exploring many new areas in the field.


Subject(s)
Medical Record Linkage , Oral Health , Humans , Medical Record Linkage/methods , Information Storage and Retrieval
10.
Int J Epidemiol ; 52(1): 214-226, 2023 02 08.
Article in English | MEDLINE | ID: mdl-35748342

ABSTRACT

BACKGROUND: Methods for linking records between two datasets are well established. However, guidance is needed for linking more than two datasets. Using all 'pairwise linkages'-linking each dataset to every other dataset-is the most inclusive, but resource-intensive, approach. The 'spine' approach links each dataset to a designated 'spine dataset', reducing the number of linkages, but potentially reducing linkage quality. METHODS: We compared the pairwise and spine linkage approaches using real-world data on patients undergoing emergency bowel cancer surgery between 31 October 2013 and 30 April 2018. We linked an administrative hospital dataset (Hospital Episode Statistics; HES) capturing patients admitted to hospitals in England, and two clinical datasets comprising patients diagnosed with bowel cancer and patients undergoing emergency bowel surgery. RESULTS: The spine linkage approach, with HES as the spine dataset, created an analysis cohort of 15 826 patients, equating to 98.3% of the 16 100 patients identified using the pairwise linkage approach. There were no systematic differences in patient characteristics between these analysis cohorts. Associations of patient and tumour characteristics with mortality, complications and length of stay were not sensitive to the linkage approach. When eligibility criteria were applied before linkage, spine linkage included 14 509 patients (90.0% compared with pairwise linkage). CONCLUSION: Spine linkage can be used as an efficient alternative to pairwise linkage if case ascertainment in the spine dataset and data quality of linkage variables are high. These aspects should be systematically evaluated in the nominated spine dataset before spine linkage is used to create the analysis cohort.


Subject(s)
Colorectal Neoplasms , Electronic Health Records , Humans , Medical Record Linkage/methods , Hospitals , Hospitalization
11.
BMC Res Notes ; 15(1): 337, 2022 Oct 31.
Article in English | MEDLINE | ID: mdl-36316778

ABSTRACT

OBJECTIVE: The aim of this study was to determine whether a secure, privacy-preserving record linkage (PPRL) methodology can be implemented in a scalable manner for use in a large national clinical research network. RESULTS: We established the governance and technical capacity to support the use of PPRL across the National Patient-Centered Clinical Research Network (PCORnet®). As a pilot, four sites used the Datavant software to transform patient personally identifiable information (PII) into de-identified tokens. We queried the sites for patients with a clinical encounter in 2018 or 2019 and matched their tokens to determine whether overlap existed. We described patient overlap among the sites and generated a "deduplicated" table of patient demographic characteristics. Overlapping patients were found in 3 of the 6 site-pairs. Following deduplication, the total patient count was 3,108,515 (0.11% reduction), with the largest reduction in count for patients with an "Other/Missing" value for Sex; from 198 to 163 (17.6% reduction). The PPRL solution successfully links patients across data sources using distributed queries without directly accessing patient PII. The overlap queries and analysis performed in this pilot is being replicated across the full network to provide additional insight into patient linkages among a distributed research network.


Subject(s)
Electronic Health Records , Privacy , Humans , Medical Record Linkage/methods , Databases, Factual , Patient-Centered Care
12.
Epidemiol Serv Saude ; 31(3): e20211272, 2022.
Article in English, Portuguese | MEDLINE | ID: mdl-36287481

ABSTRACT

OBJECTIVE: To present a standardized methodology for linking different public health databases. METHODS: This was a methodological review article specifically describing data processing procedures for deterministic linkage between structured databases. It instructs on how to: treat data, select linkage keys, and link databases using two databases simulated in R software. RESULTS: The commands used for the deterministic linkage of the inner_join type were presented. The linkage process resulted in a database with 40,108 pairs using only the "Name" key. Adding the second key, "Name of mother", the resulted dropped to 112 pairs. By adding the third key, "Date of birth", only two pairs were identified. CONCLUSION: Database linkage and its analysis are valid and valuable tools for health services in supporting health surveillance actions.


Subject(s)
Information Storage and Retrieval , Medical Record Linkage , Humans , Medical Record Linkage/methods , Brazil , Databases, Factual , Software
13.
J Am Med Inform Assoc ; 29(12): 2105-2109, 2022 11 14.
Article in English | MEDLINE | ID: mdl-36305781

ABSTRACT

Healthcare systems are hampered by incomplete and fragmented patient health records. Record linkage is widely accepted as a solution to improve the quality and completeness of patient records. However, there does not exist a systematic approach for manually reviewing patient records to create gold standard record linkage data sets. We propose a robust framework for creating and evaluating manually reviewed gold standard data sets for measuring the performance of patient matching algorithms. Our 8-point approach covers data preprocessing, blocking, record adjudication, linkage evaluation, and reviewer characteristics. This framework can help record linkage method developers provide necessary transparency when creating and validating gold standard reference matching data sets. In turn, this transparency will support both the internal and external validity of recording linkage studies and improve the robustness of new record linkage strategies.


Subject(s)
Health Records, Personal , Medical Record Linkage , Humans , Medical Record Linkage/methods , Algorithms , Information Storage and Retrieval , Data Collection
14.
PLoS One ; 17(9): e0267893, 2022.
Article in English | MEDLINE | ID: mdl-36137086

ABSTRACT

Linking several databases containing information on the same person is an essential step of many data workflows. Due to the potential sensitivity of the data, the identity of the persons should be kept private. Privacy-Preserving Record-Linkage (PPRL) techniques have been developed to link persons despite errors in the identifiers used to link the databases without violating their privacy. The basic approach is to use encoded quasi-identifiers instead of plain quasi-identifiers for making the linkage decision. Ideally, the encoded quasi-identifiers should prevent re-identification but still allow for a good linkage quality. While several PPRL techniques have been proposed so far, Bloom filter-based PPRL schemes (BF-PPRL) are among the most popular due to their scalability. However, a recently proposed attack on BF-PPRL based on graph similarities seems to allow individuals' re-identification from encoded quasi-identifiers. Therefore, the graph matching attack is widely considered a serious threat to many PPRL-approaches and leads to the situation that BF-PPRL schemes are rejected as being insecure. In this work, we argue that this view is not fully justified. We show by experiments that the success of graph matching attacks requires a high overlap between encoded and plain records used for the attack. As soon as this condition is not fulfilled, the success rate sharply decreases and renders the attacks hardly effective. This necessary condition does severely limit the applicability of these attacks in practice and also allows for simple but effective countermeasures.


Subject(s)
Computer Security , Privacy , Confidentiality , Databases, Factual , Humans , Medical Record Linkage/methods
15.
Appl Clin Inform ; 13(4): 901-909, 2022 08.
Article in English | MEDLINE | ID: mdl-36170880

ABSTRACT

BACKGROUND: Chronic kidney disease (CKD) is a major global health problem that affects approximately one in 10 adults. Up to 90% of individuals with CKD go undetected until its progression to advanced stages, invariably leading to death in the absence of treatment. The project aims to fill information gaps around the burden of CKD in the Western Australian (WA) population, including incidence, prevalence, rate of progression, and economic cost to the health system. METHODS: Given the sensitivity of the information involved, the project employed a privacy preserving record linkage methodology to link data from four major pathology providers in WA to hospital records, to establish a CKD registry with continuous medical record for individuals with biochemical specification for CKD. This method uses encrypted personal identifying information in a probability-based linkage framework (Bloom filters) to help mitigate risk while maximizing linkage quality. RESULTS: The project developed interoperable technology to create a transparent CKD data catalogue which is linkable to other datasets. This technology has been designed to support the aspirations of the research program to provide linked de-identified pathology, morbidity, and mortality data that can be used to derive insights to enable better CKD patient outcomes. The cohort includes over 1 million individuals with creatinine results over the period 2002 to 2021. CONCLUSION: Using linked data from across the care continuum, researchers are able to evaluate the effectiveness of service delivery and provide evidence for policy and program development. The CKD registry will enable an innovative review of the epidemiology of CKD in WA. Linking pathology records can identify cases of CKD that are missed in the early stages due to disaggregation of results, enabling identification of at-risk populations that represent targets for early intervention and management.


Subject(s)
Privacy , Renal Insufficiency, Chronic , Adult , Australia , Creatinine , Humans , Medical Record Linkage/methods , Renal Insufficiency, Chronic/diagnosis , Renal Insufficiency, Chronic/epidemiology , Renal Insufficiency, Chronic/therapy , Semantic Web
16.
J Med Internet Res ; 24(9): e33775, 2022 09 29.
Article in English | MEDLINE | ID: mdl-36173664

ABSTRACT

BACKGROUND: Quality patient care requires comprehensive health care data from a broad set of sources. However, missing data in medical records and matching field selection are 2 real-world challenges in patient-record linkage. OBJECTIVE: In this study, we aimed to evaluate the extent to which incorporating the missing at random (MAR)-assumption in the Fellegi-Sunter model and using data-driven selected fields improve patient-matching accuracy using real-world use cases. METHODS: We adapted the Fellegi-Sunter model to accommodate missing data using the MAR assumption and compared the adaptation to the common strategy of treating missing values as disagreement with matching fields specified by experts or selected by data-driven methods. We used 4 use cases, each containing a random sample of record pairs with match statuses ascertained by manual reviews. Use cases included health information exchange (HIE) record deduplication, linkage of public health registry records to HIE, linkage of Social Security Death Master File records to HIE, and deduplication of newborn screening records, which represent real-world clinical and public health scenarios. Matching performance was evaluated using the sensitivity, specificity, positive predictive value, negative predictive value, and F1-score. RESULTS: Incorporating the MAR assumption in the Fellegi-Sunter model maintained or improved F1-scores, regardless of whether matching fields were expert-specified or selected by data-driven methods. Combining the MAR assumption and data-driven fields optimized the F1-scores in the 4 use cases. CONCLUSIONS: MAR is a reasonable assumption in real-world record linkage applications: it maintains or improves F1-scores regardless of whether matching fields are expert-specified or data-driven. Data-driven selection of fields coupled with MAR achieves the best overall performance, which can be especially useful in privacy-preserving record linkage.


Subject(s)
Health Information Exchange , Medical Record Linkage , Algorithms , Humans , Infant, Newborn , Medical Record Linkage/methods , Registries , Research Design
17.
PeerJ ; 10: e13507, 2022.
Article in English | MEDLINE | ID: mdl-35846888

ABSTRACT

Background: Public health research frequently requires the integration of information from different data sources. However, errors in the records and the high computational costs involved make linking large administrative databases using record linkage (RL) methodologies a major challenge. Methods: We present Tucuxi-BLAST, a versatile tool for probabilistic RL that utilizes a DNA-encoded approach to encrypt, analyze and link massive administrative databases. Tucuxi-BLAST encodes the identification records into DNA. BLASTn algorithm is then used to align the sequences between databases. We tested and benchmarked on a simulated database containing records for 300 million individuals and also on four large administrative databases containing real data on Brazilian patients. Results: Our method was able to overcome misspellings and typographical errors in administrative databases. In processing the RL of the largest simulated dataset (200k records), the state-of-the-art method took 5 days and 7 h to perform the RL, while Tucuxi-BLAST only took 23 h. When compared with five existing RL tools applied to a gold-standard dataset from real health-related databases, Tucuxi-BLAST had the highest accuracy and speed. By repurposing genomic tools, Tucuxi-BLAST can improve data-driven medical research and provide a fast and accurate way to link individual information across several administrative databases.


Subject(s)
Biomedical Research , Medical Record Linkage , Humans , Medical Record Linkage/methods , Databases, Factual , Brazil , Public Health
18.
J Clin Epidemiol ; 150: 18-24, 2022 10.
Article in English | MEDLINE | ID: mdl-35760238

ABSTRACT

BACKGROUND AND OBJECTIVES: To highlight the potential of multiple file record linkage. Linkage increases the value of existing information by supplying missing data or correcting errors in existing data, through generating important covariates, and by using family information to control for unmeasured variables and expand research opportunities. METHODS: Recent Manitoba papers highlight the use of linkage to produce better studies. Specific ways in which linkage helps deal with different substantive issues are described. RESULTS: Wide data files-files containing considerable amounts of information on each individual-generated by linkage improve research by facilitating better design. Nonexperimental work in particular benefits from such linkages. Population registries are especially valuable in supplying family data to facilitate work across different substantive fields. CONCLUSION: Several examples show how record linkage magnifies the value of information from individual projects. The results of observational studies become more defensible through the better designs facilitated by such linkage.


Subject(s)
Big Data , Medical Record Linkage , Humans , Medical Record Linkage/methods , Registries , Manitoba
19.
J Biomed Inform ; 130: 104094, 2022 06.
Article in English | MEDLINE | ID: mdl-35550929

ABSTRACT

Record linkage is an important problem studied widely in many domains including biomedical informatics. A standard version of this problem is to cluster records from several datasets, such that each cluster has records pertinent to just one individual. Typically, datasets are huge in size. Hence, existing record linkage algorithms take a very long time. It is thus essential to develop novel fast algorithms for record linkage. The incremental version of this problem is to link previously clustered records with new records added to the input datasets. A novel algorithm has been created to efficiently perform standard and incremental record linkage. This algorithm leverages a set of efficient techniques that significantly restrict the number of record pair comparisons and distance computations. Our algorithm shows an average speed-up of 2.4x (up to 4x) for the standard linkage problem as compared to the state-of-the-art, without any drop in linkage performance at all. On average, our algorithm can incrementally link records in just 33% of the time required for linking them from scratch. Our algorithms achieve comparable or superior linkage performance and outperform the state-of-the-art in terms of linking time in all cases where the number of comparison attributes is greater than two. In practice, more than two comparison attributes are quite common. The proposed algorithm is very efficient and could be used in practice for record linkage applications especially when records are being added over time and linkage output needs to be updated frequently.


Subject(s)
Algorithms , Medical Record Linkage , Medical Record Linkage/methods
20.
J Am Med Inform Assoc ; 29(8): 1409-1415, 2022 07 12.
Article in English | MEDLINE | ID: mdl-35568993

ABSTRACT

OBJECTIVE: This study sought both to support evidence-based patient identity policy development by illustrating an approach for formally evaluating operational matching methods, and also to characterize the performance of both referential and probabilistic patient matching algorithms using real-world demographic data. MATERIALS AND METHODS: We assessed matching accuracy for referential and probabilistic matching algorithms using a manually reviewed 30 000 record gold standard reference dataset derived from a large health information exchange containing over 47 million patient registrations. We applied referential and probabilistic algorithms to this dataset and compared the outputs to the gold standard. We computed performance metrics including sensitivity (recall), positive predictive value (precision), and F-score for each algorithm. RESULTS: The probabilistic algorithm exhibited sensitivity, positive predictive value (PPV), and F-score of .6366, 0.9995, and 0.7778, respectively. The referential algorithm exhibited corresponding sensitivity, PPV, and F-score values of 0.9351, 0.9996, and 0.9663, respectively. Treating discordant and limited-data records as nonmatches increased referential match sensitivity to 0.9578. Compared to the more traditional probabilistic approach, referential matching exhibits greater accuracy. CONCLUSIONS: Referential patient matching, an increasingly popular method among health IT vendors, demonstrated notably greater accuracy than a more traditional probabilistic approach without the adaptation of the algorithm to the data that the traditional probabilistic approach usually requires. Health IT policymakers, including the Office of the National Coordinator for Health Information Technology (ONC), should explore strategies to expand the evidence base for real-world matching system performance, given the need for an evidence-based patient identity strategy.


Subject(s)
Health Information Exchange , Medical Record Linkage , Algorithms , Humans , Medical Record Linkage/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...