INDEX
Explanations
phrases related to censorship and suppression
actions related to suppression and censorship
New Auto-Interp
Negative Logits
ortment
-0.78
esides
-0.65
luster
-0.65
Suc
-0.65
immer
-0.64
ammy
-0.64
olds
-0.64
âĹ¼
-0.63
-------
-0.61
itched
-0.61
POSITIVE LOGITS
ively
0.92
uate
0.81
him
0.79
offending
0.78
opposing
0.77
enance
0.75
them
0.73
everything
0.72
dissent
0.72
oneself
0.71
Activations Density 0.199%