INDEX
Explanations
themes related to equality and inclusivity
New Auto-Interp
Negative Logits
asan
-0.19
iken
-0.16
tere
-0.16
heel
-0.16
eza
-0.16
opy
-0.16
artz
-0.15
gid
-0.15
urar
-0.15
antz
-0.15
POSITIVE LOGITS
everyone
0.23
everyone
0.22
Everyone
0.21
universal
0.20
Everyone
0.20
bjerg
0.20
age
0.19
everybody
0.19
Universal
0.18
universal
0.18
Activations Density 0.189%