NamePrism : a name-based nationality classifier
NamePrism is a non-commercial nationality/ethnicity classification tool that aims to support academic research, e.g. sociology and demographic studies. In this project, we learn name embeddings for name parts (first/last names) and classify names to 39 leaf nationalities and 6 U.S. ethnicities.
Contact
- Junting Ye: juyye at cs dot stonybrook dot edu
- Prof. Steven Skiena: skiena at cs dot stonybrook dot edu
- Dr. Yifan Hu: yifanhu at yahoo-inc dot com
Nationality Taxonomy
NamePrism is trained on a 74M labeled name set from 118 countries. These countries are associated with a 39-leaf nationality taxonomy. The details are shown in the following treemap.
Ethnicity Classes
Six ethnicity/races are considered in our ethnicity classifier: White, Black, API (Asian and Pacific Islander), AIAN (American Indian and Alaska Native), 2PRACE (more than 2 race) and Hispanic.
Citation
NamePrism is a free natinoality/ethnicity classification API that achieves best performance when compared to exisiting free online systems. Please cite following publication if you used NamePrism in your work.
Nationality Classification using Name Embeddings .
Junting Ye, Shuchu Han, Yifan Hu, Baris Coskun, Meizhu Liu, Hong Qin and Steven Skiena.
CIKM, Singapore, Nov. 2017.
Credits
The hierachical pie chart visualization in NamePrism results are inspired by this, under Apache License v2.