INDEX
Explanations
themes related to injustice and discrimination
New Auto-Interp
Negative Logits
-
-0.17
Bale
-0.16
&
-0.16
seperate
-0.16
[&
-0.15
egin
-0.15
–
-0.15
signalling
-0.15
bod
-0.15
seper
-0.14
POSITIVE LOGITS
ez
0.17
embre
0.17
esser
0.15
ç°
0.15
Usa
0.15
pushViewController
0.15
azı
0.15
mấy
0.14
eman
0.14
negoci
0.14
Activations Density 0.001%