INDEX
Explanations
phrases related to public opinion or voting outcomes
references to the concept of popularity in various contexts
New Auto-Interp
Negative Logits
thur
-0.75
ourke
-0.70
Aviv
-0.70
abetic
-0.68
Kear
-0.67
agher
-0.67
xual
-0.66
ritch
-0.64
Territ
-0.63
ASC
-0.63
POSITIVE LOGITS
ized
0.96
izing
0.92
isations
0.91
izations
0.89
ity
0.87
ised
0.86
ization
0.76
izer
0.76
ize
0.74
izers
0.73
Activations Density 0.014%