SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell et al.
While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. The dataset includes stereotypes from 20 geopolitical regions and languages, spanning multiple identity cate016 gories subject to discrimination worldwide.