INDEX
Explanations
phrases related to political statements and endorsements
New Auto-Interp
Negative Logits
ĻĤ
-0.71
kw
-0.68
Kut
-0.65
iaries
-0.65
Shot
-0.64
eka
-0.64
holder
-0.61
suspic
-0.61
Shot
-0.61
cephal
-0.61
POSITIVE LOGITS
izu
0.68
ie
0.66
ushima
0.65
[
0.65
/"
0.62
ocumented
0.61
"$:/
0.60
jriwal
0.59
Mao
0.58
(£
0.57
Activations Density 0.068%