Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 4.864
1.
BMC Public Health ; 24(1): 1475, 2024 Jun 01.
Article En | MEDLINE | ID: mdl-38824562

BACKGROUND: Globally, the counting of deaths based on gender identity and sexual orientation has been a challenge for health systems. In most cases, non-governmental organizations have dedicated themselves to this work. Despite these efforts in generating information, the scarcity of official data presents significant limitations in policy formulation and actions guided by population needs. Therefore, this manuscript aims to evaluate the accuracy, potential, and limits of probabilistic data relationships to yield information on deaths according to gender identity and sexual orientation in the State of Rio de Janeiro. METHODS: This study evaluated the accuracy of the probabilistic record linkage to obtain information on deaths according to gender and sexual orientation. Data from two information systems were used from June 15, 2015 to December 31, 2020. We constructed nine probabilistic data relationship strategies and identified the performance and cutoff points of the best strategy. RESULTS: The best data blocking strategy was established through logical blocks with the first and last names, birthdate, and mother's name in the pairing strategy. With a population base of 80,178 records, 1556 deaths were retrieved. With an area under the curve of 0.979, this strategy presented 93.26% accuracy, 98.46% sensitivity, and 90.04% specificity for the cutoff point ≥ 17.9 of the data relationship score. The adoption of the cutoff point optimized the manual review phase, identifying 2259 (90.04%) of the 2509 false pairs and identifying 1532 (98.46%) of the 1556 true pairs. CONCLUSION: With the identification of possible strategies for determining probabilistic data relationships, the retrieval of information on mortality according to sexual and gender markers has become feasible. Based on information from the daily routine of health services, the formulation of public policies that consider the LGBTQ + population more closely reflects the reality experienced by these population groups.


Gender Identity , Sexual Behavior , Humans , Brazil/epidemiology , Female , Male , Sexual Behavior/statistics & numerical data , Medical Record Linkage , Data Accuracy , Death Certificates , Adult
2.
Pharmacoepidemiol Drug Saf ; 33(6): e5845, 2024 Jun.
Article En | MEDLINE | ID: mdl-38825961

PURPOSE: Medications are commonly used during pregnancy to manage pre-existing conditions and conditions that arise during pregnancy. However, not all medications are safe to use in pregnancy. This study utilized privacy-preserving record linkage (PPRL) to examine medications dispensed under the national Pharmaceutical Benefits Scheme (PBS) to pregnant women in Western Australia (WA) overall and by medication safety category. METHODS: In this retrospective, cross-sectional, population-based study, state perinatal records (Midwives Notification Scheme) were linked with national PBS dispensing data using PPRL. Live and stillborn neonates born between 2012 and 2019 in WA were included. The proportion of pregnancies during which the mother was dispensed a PBS medication was calculated, overall and by medication safety category. Factors associated with PBS medication dispensing were examined using logistic regression. RESULTS: PPRL linkage identified matching records for 97.4% of women with perinatal records. A total of 271 739 pregnancies were identified, with 158 585 (58.4%) pregnancies involving the dispensing of at least one PBS medication. Category A medications (those considered safe in pregnancy) were the most commonly dispensed (n = 119 126, 43.8%) followed by B3 (n = 51 135, 18.8%) and B1 (n = 42 388, 15.6%) medication (those with unknown safety). Over the study period, the dispensing of PBS medications in pregnancy increased (OR: 1.06, 95%CI: 1.06, 1.07). The strongest predictor of medication dispensing in pregnancy was pre-pregnancy dispensing (OR: 3.61, 95%CI: 3.54, 3.68). Other factors associated with medication use in pregnancy were smoking, older maternal age, obesity, and prior pregnancies. CONCLUSION: Privacy preserving record linkage provides a way to link cross-jurisdictional data while preserving patient confidentiality and data security. The dispensing of PBS medication in pregnancy was common and increased over time, with approximately 60% of women dispensed at least one medication during pregnancy.


Medical Record Linkage , Humans , Female , Pregnancy , Western Australia , Retrospective Studies , Adult , Cross-Sectional Studies , Young Adult , Insurance, Pharmaceutical Services/statistics & numerical data , Adolescent , Infant, Newborn
3.
Stud Health Technol Inform ; 314: 139-143, 2024 May 23.
Article En | MEDLINE | ID: mdl-38785020

The implementation of an Electronic Prescribing (EP) system offers numerous advantages in enhancing the efficiency of prescribing practices. To ensure successful implementation, a comprehensive understanding of the workflow in paper-based prescribing is crucial. In Iran, the Ministry of Health, and Medical Education (MOHME) has been actively involved in developing an EP system since 2011. The pilot results within MOHME have garnered significant support from all basic insurance organizations, primarily due to the importance of addressing financial considerations. As a result, these insurance organizations have taken the lead in the national development of the EP system, as responsibilities have shifted. The development of an Integrated Care Electronic Health Record (ICEHR or EHR) and the approach adopted by MOHME have paved the way for the creation of a standardized set of Application Programming Interfaces (APIs) based on openEHR and ISO13606 standards. These APIs facilitate the secure transfer of consolidated data from the EP systems, stored in the data warehouses of basic insurance organizations, to the Iranian EHR. This model follows an ICEHR architecture that emphasizes the transmission of this information to the Iranian EHR. This paper provides a detailed discussion of the various aspects and accomplishments related to these developments.


Electronic Health Records , Electronic Prescribing , Iran , Models, Organizational , Medical Record Linkage , Humans
4.
J Korean Med Sci ; 39(14): e127, 2024 Apr 15.
Article En | MEDLINE | ID: mdl-38622936

BACKGROUND: To overcome the limitations of relying on data from a single institution, many researchers have studied data linkage methodologies. Data linkage includes errors owing to legal issues surrounding personal information and technical issues related to data processing. Linkage errors affect selection bias, and external and internal validity. Therefore, quality verification for each connection method with adherence to personal information protection is an important issue. This study evaluated the linkage quality of linked data and analyzed the potential bias resulting from linkage errors. METHODS: This study analyzed claims data submitted to the Health Insurance Review and Assessment Service (HIRA DATA). The linkage errors of the two deterministic linkage methods were evaluated based on the use of the match key. The first deterministic linkage uses a unique identification number, and the second deterministic linkage uses the name, gender, and date of birth as a set of partial identifiers. The linkage error included in this deterministic linkage method was compared with the absolute standardized difference (ASD) of Cohen's according to the baseline characteristics, and the linkage quality was evaluated through the following indicators: linked rate, false match rate, missed match rate, positive predictive value, sensitivity, specificity, and F1-score. RESULTS: For the deterministic linkage method that used the name, gender, and date of birth as a set of partial identifiers, the true match rate was 83.5 and the missed match rate was 16.5. Although there was bias in some characteristics of the data, most of the ASD values were less than 0.1, with no case greater than 0.5. Therefore, it is difficult to determine whether linked data constructed with deterministic linkages have substantial differences. CONCLUSION: This study confirms the possibility of building health and medical data at the national level as the first data linkage quality verification study using big data from the HIRA. Analyzing the quality of linkages is crucial for comprehending linkage errors and generating reliable analytical outcomes. Linkers should increase the reliability of linked data by providing linkage error-related information to researchers. The results of this study will serve as reference data to increase the reliability of multicenter data linkage studies.


Information Storage and Retrieval , Medical Record Linkage , Humans , Reproducibility of Results , Medical Record Linkage/methods , Predictive Value of Tests , Health Services
5.
Stud Health Technol Inform ; 313: 49-54, 2024 Apr 26.
Article En | MEDLINE | ID: mdl-38682504

BACKGROUND: The Fast Healthcare Interoperability Resources (FHIR) and Clinical Document Architecture (CDA) are standards for the healthcare industry, designed to improve the exchange of health data by interoperability. Both standards are constrained through what are known as Implementation Guides (IG) for specific use. OBJECTIVES: Both of these two standards are widely in use and play an important role in the Austrian healthcare system. Concepts existing in CDA and FHIR must be aligned between both standards. METHODS: Many existing approaches are presented and discussed, none are fully suited to the needs in Austria. RESULTS: The IG Publisher has already been used for CDA IGs, beside of its intended FHIR support, but never for both in one IG. Even the International Patient Summary (IPS), existing as CDA and FHIR specification, does not solve the needed comparability between these two. CONCLUSION: As the IG Publisher is widely used and supports CDA, it should be used for Dual Implementation Guides. Further work and extension of IG Publisher is necessary to enhance the readability of the resulting IGs.


Electronic Health Records , Health Information Interoperability , Austria , Health Information Interoperability/standards , Humans , Medical Record Linkage/standards
6.
Stud Health Technol Inform ; 313: 143-148, 2024 Apr 26.
Article En | MEDLINE | ID: mdl-38682520

BACKGROUND: The Fast Health Interoperability Resources (FHIR) standard was proposed and released to solve the interoperability problems of the electronic health records. The FHIR Subscription resources are used to establish real-time event notifications from the FHIR server to another system. There are several communication channels such as rest-hook and websocket. The objective of our work is to compare the performance of the FHIR subscription using the rest-hook and websocket channels. METHODS: HAPI FHIR server, python websocket clients and HTTP endpoints were used to measure the processor and memory usage of the two subscription channels. Tests were performed with 5, 10, 15, 20, 30, 40, 50, 60, 70 and 80 clients. The performance was logged using windows performance monitor. RESULTS: The rest-hook subscription showed near six-fold increase in resource utilization when increasing the clients from 5 to 80. On the contrary, the websocket subscription channel did not reach a two-fold increase. CONCLUSION: The type of the subscription channel should be carefully selected and load distribution should be considered when the number of clients grows.


Electronic Health Records , Health Information Interoperability , Humans , Medical Record Linkage
7.
Stud Health Technol Inform ; 313: 124-128, 2024 Apr 26.
Article En | MEDLINE | ID: mdl-38682516

BACKGROUND: Electronic health records (EHR) emerged as a digital record of the data that is generated in the healthcare. OBJECTIVES: In this paper the transfer times of EHRs using the Hypertext Transfer Protocol and WebSocket in both local network and wide area network (WAN) are compared. METHODS: A python web application to serve Fast Health Interoperability Resources (FHIR) records is created and the transfer times of the EHRs over both HTTP and WebSocket connection are measured. 45000 test Patient resources in 20, 50, 100 and 200 resources per Bundle transfers are used. RESULTS: WebSocket showed much better transfer times of large amount of data. These were 18 s shorter in the local network and 342 s shorter in WAN for the 20 resource per Bundle transfer. CONCLUSION: RESTful APIs are a convenient way to implement EHR servers; on the other hand, HTTP becomes a bottleneck when transferring large amount of data. WebSocket shows better transfer times and thus its superiority in such situations. The problem can be addressed by developing a new communication protocol or by using network tunneling to handle large data transfer of EHRs.


Electronic Health Records , Humans , Medical Record Linkage/methods , Internet , Health Information Interoperability , Software
8.
Int J Med Inform ; 185: 105387, 2024 May.
Article En | MEDLINE | ID: mdl-38428200

BACKGROUND: Cancer registries link a large number of electronic health records reported by medical institutions to already registered records of the matching individual and tumor. Records are automatically linked using deterministic and probabilistic approaches; machine learning is rarely used. Records that cannot be matched automatically with sufficient accuracy are typically processed manually. For application, it is important to know how well record linkage approaches match real-world records and how much manual effort is required to achieve the desired linkage quality. We study the task of linking reported records to the matching registered tumor in cancer registries. METHODS: We compare the tradeoff between linkage quality and manual effort of five machine learning methods (logistic regression, random forest, gradient boosting, neural network, and a stacked method) to a deterministic baseline. The record linkage methods are compared in a two-class setting (no-match/ match) and a three-class setting (no-match/ undecided/ match). A cancer registry collected and linked the dataset consisting of categorical variables matching 145,755 reported records with 33,289 registered tumors. RESULTS: In the two-class setting, the gradient boosting, neural network, and stacked models have higher accuracy and F1 score (accuracy: 0.968-0.978, F1 score: 0.983-0.988) than the deterministic baseline (accuracy: 0.964, F1 score: 0.980) when the same records are manually processed (0.89% of all records). In the three-class setting, these three machine learning methods can automatically process all reported records and still have higher accuracy and F1 score than the deterministic baseline. The linkage quality of the machine learning methods studied, except for the neural network, increase as the number of manually processed records increases. CONCLUSION: Machine learning methods can significantly improve linkage quality and reduce the manual effort required by medical coders to match tumor records in cancer registries compared to a deterministic baseline. Our results help cancer registries estimate how linkage quality increases as more records are manually processed.


Electronic Health Records , Neoplasms , Humans , Medical Record Linkage/methods , Neoplasms/epidemiology , Registries , Databases, Factual
9.
Int J Popul Data Sci ; 9(1): 2137, 2024.
Article En | MEDLINE | ID: mdl-38425790

Introduction: Recent years have seen an increase in linkages between survey and administrative data. It is important to evaluate the quality of such data linkages to discern the likely reliability of ensuing research. Evaluation of linkage quality and bias can be conducted using different approaches, but many of these are not possible when there is a separation of processes for linkage and analysis to help preserve privacy, as is typically the case in the UK (and elsewhere). Objectives: We aimed to describe a suite of generalisable methods to evaluate linkage quality and population representativeness of linked survey and administrative data which remain tractable when users of the linked data are not party to the linkage process itself. We emphasise issues particular to longitudinal survey data throughout. Methods: Our proposed approaches cover several areas: i) Linkage rates, ii) Selection into response, linkage consent and successful linkage, iii) Linkage quality, and iv) Linked data population representativeness. We illustrate these methods using a recent linkage between the 1958 National Child Development Study (NCDS; a cohort following an initial 17,415 people born in Great Britain in a single week of 1958) and Hospital Episode Statistics (HES) databases (containing important information regarding admissions, accident and emergency attendances and outpatient appointments at NHS hospitals in England). Results: Our illustrative analyses suggest that the linkage quality of the NCDS-HES data is high and that the linked sample maintains an excellent level of population representativeness with respect to the single dimension we assessed. Conclusions: Through this work we hope to encourage providers and users of linked data resources to undertake and publish thorough evaluations. We further hope that providing illustrative analyses using linked NCDS-HES data will improve the quality and transparency of research using this particular linked data resource.


Child Development , Medical Record Linkage , Child , Humans , Reproducibility of Results , Medical Record Linkage/methods , Hospitalization , Hospitals
10.
BMC Med Res Methodol ; 24(1): 13, 2024 Jan 17.
Article En | MEDLINE | ID: mdl-38233744

BACKGROUND: Community optometrists in Scotland have performed regular free-at-point-of-care eye examinations for all, for over 15 years. Eye examinations include retinal imaging but image storage is fragmented and they are not used for research. The Scottish Collaborative Optometry-Ophthalmology Network e-research project aimed to collect these images and create a repository linked to routinely collected healthcare data, supporting the development of pre-symptomatic diagnostic tools. METHODS: As the image record was usually separate from the patient record and contained minimal patient information, we developed an efficient matching algorithm using a combination of deterministic and probabilistic steps which minimised the risk of false positives, to facilitate national health record linkage. We visited two practices and assessed the data contained in their image device and Practice Management Systems. Practice activities were explored to understand the context of data collection processes. Iteratively, we tested a series of matching rules which captured a high proportion of true positive records compared to manual matches. The approach was validated by testing manual matching against automated steps in three further practices. RESULTS: A sequence of deterministic rules successfully matched 95% of records in the three test practices compared to manual matching. Adding two probabilistic rules to the algorithm successfully matched 99% of records. CONCLUSIONS: The potential value of community-acquired retinal images can be harnessed only if they are linked to centrally-held healthcare care data. Despite the lack of interoperability between systems within optometry practices and inconsistent use of unique identifiers, data linkage is possible using robust, almost entirely automated processes.


Medical Record Linkage , Medical Records , Humans , Medical Records Systems, Computerized , Data Collection , Scotland
11.
J Surg Res ; 295: 274-280, 2024 Mar.
Article En | MEDLINE | ID: mdl-38048751

INTRODUCTION: Trauma registries and their quality improvement programs only collect data from the acute hospital admission, and no additional information is captured once the patient is discharged. This lack of long-term data limits these programs' ability to affect change. The goal of this study was to create a longitudinal patient record by linking trauma registry data with third party payer claims data to allow the tracking of these patients after discharge. METHODS: Trauma quality collaborative data (2018-2019) was utilized. Inclusion criteria were patients age ≥18, ISS ≥5 and a length of stay ≥1 d. In-hospital deaths were excluded. A deterministic match was performed with insurance claims records based on the hospital name, date of birth, sex, and dates of service (±1 d). The effect of payer type, ZIP code, International Classification of Diseases, Tenth Revision, Clinical Modification diagnosis specificity and exact dates of service on the match rate was analyzed. RESULTS: The overall match rate between these two patient record sources was 27.5%. There was a significantly higher match rate (42.8% versus 6.1%, P < 0.001) for patients with a payer that was contained in the insurance collaborative. In a subanalysis, exact dates of service did not substantially affect this match rate; however, specific International Classification of Diseases, Tenth Revision, Clinical Modification codes (i.e., all 7 characters) reduced this rate by almost half. CONCLUSIONS: We demonstrated the successful linkage of patient records in a trauma registry with their insurance claims. This will allow us to the collect longitudinal information so that we can follow these patients' long-term outcomes and subsequently improve their care.


Insurance , Medical Record Linkage , Humans , Registries , Medical Records , Hospitalization
13.
Aust Health Rev ; 48(1): 8-15, 2024 Feb.
Article En | MEDLINE | ID: mdl-38118279

Objective Data linkage is a very powerful research tool in epidemiology, however, establishing this can be a lengthy and intensive process. This paper reports on the complex landscape of conducting data linkage projects in Australia. Methods We reviewed the processes, required documentation, and applications required to conduct multi-jurisdictional data linkage across Australia, in 2023. Results Obtaining the necessary approvals to conduct linkage will likely take nearly 2 years (estimated 730 days, including 605 days from initial submission to obtaining all ethical approvals and an estimated further 125 days for the issuance of unexpected additionally required approvals). Ethical review for linkage projects ranged from 51 to 128 days from submission to ethical approval, and applications consisted of 9-25 documents. Conclusions Major obstacles to conducting multi-jurisdictional data linkage included the complexity of the process, and substantial time and financial costs. The process was characterised by inefficiencies at several levels, reduplication, and a lack of any key accountabilities for timely performance of processes. Data linkage is an invaluable resource for epidemiological research. Further streamlining, establishing accountability, and greater collaboration between jurisdictions is needed to ensure data linkage is both accessible and feasible to researchers.


Heart Defects, Congenital , Medical Record Linkage , Humans , Medical Record Linkage/methods , Registries , Australia/epidemiology , Information Storage and Retrieval , Heart Defects, Congenital/epidemiology
14.
JAMA ; 330(24): 2333-2334, 2023 12 26.
Article En | MEDLINE | ID: mdl-37983066

This Viewpoint discusses the use of privacy-preserving record linkage, a token-based record linkage system, as a promising avenue for building a data infrastructure system that bridges isolated data.


Computer Security , Delivery of Health Care , Information Dissemination , Medical Record Linkage , Privacy , Delivery of Health Care/methods , Information Dissemination/methods
15.
PLoS One ; 18(10): e0291581, 2023.
Article En | MEDLINE | ID: mdl-37862306

Research with administrative records involves the challenge of limited information in any single data source to answer policy-related questions. Record linkage provides researchers with a tool to supplement administrative datasets with other information about the same people when identified in separate sources as matched pairs. Several solutions are available for undertaking record linkage, producing linkage keys for merging data sources for positively matched pairs of records. In the current manuscript, we demonstrate a new application of the Python RecordLinkage package to family-based record linkages with machine learning algorithms for probability scoring, which we call probabilistic record linkage for families (PRLF). First, a simulation of administrative records identifies PRLF accuracy with variations in match and data degradation percentages. Accuracy is largely influenced by degradation (e.g., missing data fields, mismatched values) compared to the percentage of simulated matches. Second, an application of data linkage is presented to compare regression model estimate performance across three record linkage solutions (PRLF, ChoiceMaker, and Link Plus). Our findings indicate that all three solutions, when optimized, provide similar results for researchers. Strengths of our process, such as the use of ensemble methods, to improve match accuracy are discussed. We then identify caveats of record linkage in the context of administrative data.


Algorithms , Medical Record Linkage , Humans , Medical Record Linkage/methods , Computer Simulation , Probability , Information Storage and Retrieval
16.
Int J Popul Data Sci ; 8(1): 1751, 2023.
Article En | MEDLINE | ID: mdl-37636833

Introduction: The patient journey for residents of New South Wales (NSW) Australia with ST-elevation myocardial infarction (STEMI) often involves transfer between hospitals and these can include stays in hospitals in other jurisdictions. Objective: To estimate the change in enumeration of STEMI hospitalisations and time to subsequent cardiac procedures for NSW residents using cross-jurisdictional linkage of administrative health data. Methods: Records for NSW residents aged 20 years and over admitted to hospitals in NSW and four adjacent jurisdictions (Australian Capital Territory, Queensland, South Australia, and Victoria) between 1 July 2013 and 30 June 2018 with a principal diagnosis of STEMI were linked with records of the Australian Government Medicare Benefits Schedule (MBS). The number of STEMI hospitalisations, and rates of angiography, percutaneous coronary intervention and coronary artery bypass graft were compared for residents of different local health districts within NSW with and without inclusion of cross-jurisdictional data. Results: Inclusion of cross-jurisdictional hospital and MBS data increased the enumeration of STEMI hospitalisations for NSW residents by 8% (from 15,420 to 16,659) and procedure rates from 85.6% to 88.2%. For NSW residents who lived adjacent to a jurisdictional border, hospitalisation counts increased by up to 210% and procedure rates by up to 70 percentage points. Conclusions: Cross-jurisdictional linked hospital data is essential to understand patient journeys of NSW residents who live in border areas and to evaluate adherence to treatment guidelines for STEMI. MBS data are useful where hospital data are not available and for procedures that may be conducted in out-patient settings.


Hospitalization , ST Elevation Myocardial Infarction , Aged , Humans , Hospitalization/statistics & numerical data , National Health Programs , Outpatients , ST Elevation Myocardial Infarction/epidemiology , Victoria , Medical Record Linkage
17.
Int J Popul Data Sci ; 8(1): 2115, 2023.
Article En | MEDLINE | ID: mdl-37636835

Databases covering all individuals of a population are increasingly used for research and decision-making. The massive size of such databases is often mistaken as a guarantee for valid inferences. However, population data have characteristics that make them challenging to use. Various assumptions on population coverage and data quality are commonly made, including how such data were captured and what types of processing have been applied to them. Furthermore, the full potential of population data can often only be unlocked when such data are linked to other databases. Record linkage often implies subtle technical problems, which are easily missed. We discuss a diverse range of myths and misconceptions relevant for anybody capturing, processing, linking, or analysing population data. Remarkably, many of these myths and misconceptions are due to the social nature of data collections and are therefore missed by purely technical accounts of data processing. Many are also not well documented in scientific publications. We conclude with a set of recommendations for using population data.


Data Accuracy , Medical Record Linkage , Humans , Data Collection , Databases, Factual , Information Storage and Retrieval , Population Health
18.
Stat Med ; 42(27): 4931-4951, 2023 Nov 30.
Article En | MEDLINE | ID: mdl-37652076

In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that the proposed method can improve the linking process, and can result in accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare enrollment records.


Medical Record Linkage , Medicare , Aged , Humans , United States , Bayes Theorem , Medical Record Linkage/methods , Algorithms
19.
BMC Med Inform Decis Mak ; 23(1): 85, 2023 05 05.
Article En | MEDLINE | ID: mdl-37147600

BACKGROUND: Epidemiological research may require linkage of information from multiple organizations. This can bring two problems: (1) the information governance desirability of linkage without sharing direct identifiers, and (2) a requirement to link databases without a common person-unique identifier. METHODS: We develop a Bayesian matching technique to solve both. We provide an open-source software implementation capable of de-identified probabilistic matching despite discrepancies, via fuzzy representations and complete mismatches, plus de-identified deterministic matching if required. We validate the technique by testing linkage between multiple medical records systems in a UK National Health Service Trust, examining the effects of decision thresholds on linkage accuracy. We report demographic factors associated with correct linkage. RESULTS: The system supports dates of birth (DOBs), forenames, surnames, three-state gender, and UK postcodes. Fuzzy representations are supported for all except gender, and there is support for additional transformations, such as accent misrepresentation, variation for multi-part surnames, and name re-ordering. Calculated log odds predicted a proband's presence in the sample database with an area under the receiver operating curve of 0.997-0.999 for non-self database comparisons. Log odds were converted to a decision via a consideration threshold θ and a leader advantage threshold δ. Defaults were chosen to penalize misidentification 20-fold versus linkage failure. By default, complete DOB mismatches were disallowed for computational efficiency. At these settings, for non-self database comparisons, the mean probability of a proband being correctly declared to be in the sample was 0.965 (range 0.931-0.994), and the misidentification rate was 0.00249 (range 0.00123-0.00429). Correct linkage was positively associated with male gender, Black or mixed ethnicity, and the presence of diagnostic codes for severe mental illnesses or other mental disorders, and negatively associated with birth year, unknown ethnicity, residential area deprivation, and presence of a pseudopostcode (e.g. indicating homelessness). Accuracy rates would be improved further if person-unique identifiers were also used, as supported by the software. Our two largest databases were linked in 44 min via an interpreted programming language. CONCLUSIONS: Fully de-identified matching with high accuracy is feasible without a person-unique identifier and appropriate software is freely available.


Medical Record Linkage , Privacy , Humans , Male , Bayes Theorem , State Medicine , Software
...