INDEX
Explanations
phrases related to political statements or opinions
New Auto-Interp
Negative Logits
ress
-0.68
arest
-0.66
andem
-0.65
Pont
-0.65
Tank
-0.64
oses
-0.63
aukee
-0.63
estern
-0.62
Guard
-0.62
α
-0.61
POSITIVE LOGITS
although
1.13
"[
1.10
'[
0.85
"...
0.85
they
0.85
"â̦
0.84
soever
0.82
there
0.81
whilst
0.79
while
0.77
Activations Density 0.537%