INDEX
Explanations
references to societal issues and movements, particularly those related to justice and equality
New Auto-Interp
Negative Logits
ança
-0.15
impl
-0.15
ged
-0.14
oris
-0.14
OS
-0.14
Si
-0.14
vere
-0.14
sh
-0.14
Alternate
-0.14
AN
-0.14
POSITIVE LOGITS
berman
0.19
nebu
0.16
izzo
0.16
etc
0.16
etc
0.15
oÄŁ
0.15
MÄĽst
0.15
ij¸
0.15
اتÙĩ
0.15
azen
0.14
Activations Density 0.674%