How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?
PLoS One
; 18(11): e0294562, 2023.
Article
in En
| MEDLINE
| ID: mdl-37972002
BACKGROUND: We aimed to evaluate NamSor's performance in predicting the country of origin and ethnicity of individuals based on their first/last names. METHODS: We retrieved the name and country of affiliation of all authors of PubMed publications in 2021, affiliated with universities in the twenty-two countries whose researchers authored ≥1,000 medical publications and whose percentage of migrants was <2.5% (N = 88,699). We estimated with NamSor their most likely "continent of origin" (Asia/Africa/Europe), "country of origin" and "ethnicity". We also examined two other variables that we created: "continent#2" ("Europe" replaced by "Europe/America/Oceania") and "country#2" ("Spain" replaced by "Spain/Hispanic American country" and "Portugal" replaced by "Portugal/Brazil"). Using "country of affiliation" as a proxy for "country of origin", we calculated for these five variables the proportion of misclassifications (= errorCodedWithoutNA) and the proportion of non-classifications (= naCoded). We repeated the analyses with a subsample consisting of all results with inference accuracy ≥50%. RESULTS: For the full sample and the subsample, errorCodedWithoutNA was 16.0% and 12.6% for "continent", 6.3% and 3.3% for "continent#2", 27.3% and 19.5% for "country", 19.7% and 11.4% for "country#2", and 20.2% and 14.8% for "ethnicity"; naCoded was zero and 18.0% for all variables, except for "ethnicity" (zero and 10.7%). CONCLUSION: NamSor is accurate in determining the continent of origin, especially when using the modified variable (continent#2) and/or restricting the analysis to names with accuracy ≥50%. The risk of misclassification is higher with country of origin or ethnicity, but decreases, as with continent of origin, when using the modified variable (country#2) and/or the subsample.
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Ethnicity
Limits:
Humans
Country/Region as subject:
Africa
/
Asia
/
Europa
Language:
En
Journal:
PLoS One
Journal subject:
CIENCIA
/
MEDICINA
Year:
2023
Document type:
Article
Affiliation country:
Switzerland
Country of publication:
United States