INDEX
Explanations
phrases related to political issues and controversial topics
New Auto-Interp
Negative Logits
ridor
-0.72
emale
-0.68
sha
-0.65
aza
-0.65
anmar
-0.63
ser
-0.63
eda
-0.62
sted
-0.61
ļéĨĴ
-0.61
undown
-0.61
POSITIVE LOGITS
pires
0.90
whatsoever
0.74
chooses
0.73
aign
0.71
decides
0.71
circumstances
0.69
misc
0.68
ifice
0.68
deems
0.67
faults
0.65
Activations Density 0.120%