Your browser doesn't support javascript.
loading
How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?
Sebo, Paul.
Affiliation
  • Sebo P; University Institute for Primary Care (IuMFE), University of Geneva, Geneva, Switzerland.
PLoS One ; 18(11): e0294562, 2023.
Article in En | MEDLINE | ID: mdl-37972002
BACKGROUND: We aimed to evaluate NamSor's performance in predicting the country of origin and ethnicity of individuals based on their first/last names. METHODS: We retrieved the name and country of affiliation of all authors of PubMed publications in 2021, affiliated with universities in the twenty-two countries whose researchers authored ≥1,000 medical publications and whose percentage of migrants was <2.5% (N = 88,699). We estimated with NamSor their most likely "continent of origin" (Asia/Africa/Europe), "country of origin" and "ethnicity". We also examined two other variables that we created: "continent#2" ("Europe" replaced by "Europe/America/Oceania") and "country#2" ("Spain" replaced by "Spain/Hispanic American country" and "Portugal" replaced by "Portugal/Brazil"). Using "country of affiliation" as a proxy for "country of origin", we calculated for these five variables the proportion of misclassifications (= errorCodedWithoutNA) and the proportion of non-classifications (= naCoded). We repeated the analyses with a subsample consisting of all results with inference accuracy ≥50%. RESULTS: For the full sample and the subsample, errorCodedWithoutNA was 16.0% and 12.6% for "continent", 6.3% and 3.3% for "continent#2", 27.3% and 19.5% for "country", 19.7% and 11.4% for "country#2", and 20.2% and 14.8% for "ethnicity"; naCoded was zero and 18.0% for all variables, except for "ethnicity" (zero and 10.7%). CONCLUSION: NamSor is accurate in determining the continent of origin, especially when using the modified variable (continent#2) and/or restricting the analysis to names with accuracy ≥50%. The risk of misclassification is higher with country of origin or ethnicity, but decreases, as with continent of origin, when using the modified variable (country#2) and/or the subsample.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Ethnicity Limits: Humans Country/Region as subject: Africa / Asia / Europa Language: En Journal: PLoS One Journal subject: CIENCIA / MEDICINA Year: 2023 Document type: Article Affiliation country: Switzerland Country of publication: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Ethnicity Limits: Humans Country/Region as subject: Africa / Asia / Europa Language: En Journal: PLoS One Journal subject: CIENCIA / MEDICINA Year: 2023 Document type: Article Affiliation country: Switzerland Country of publication: United States