INDEX
Explanations
phrases related to political or ideological defection
terms related to political defections
New Auto-Interp
Negative Logits
Hemp
-0.74
Lans
-0.72
Ducks
-0.71
Dice
-0.71
#$
-0.68
ãĥĥãĤ¯
-0.64
Carney
-0.64
#####
-0.63
zer
-0.63
frey
-0.62
POSITIVE LOGITS
ighters
0.88
uese
0.86
ible
0.80
ibility
0.78
atile
0.78
iru
0.75
igent
0.74
itives
0.74
oidal
0.73
inally
0.72
Activations Density 0.030%