INDEX
Explanations
mentions of racism and discussions surrounding racial stereotypes
New Auto-Interp
Negative Logits
Traversal
-0.16
تشکÛĮÙĦ
-0.15
urve
-0.15
mts
-0.14
ako
-0.14
Engl
-0.14
ijd
-0.14
rael
-0.14
installation
-0.14
CID
-0.13
POSITIVE LOGITS
racial
0.21
Native
0.21
sensitivity
0.20
race
0.20
sensitive
0.19
Race
0.18
token
0.18
racial
0.18
Race
0.18
appropri
0.18
Activations Density 0.068%