INDEX
Explanations
phrases related to support or endorsement
instances of the word "back" or its variations related to support or defense
New Auto-Interp
Negative Logits
thora
-0.74
itizen
-0.69
entric
-0.69
nesota
-0.61
lys
-0.61
orp
-0.61
ifix
-0.60
pox
-0.60
è¦ļéĨĴ
-0.59
cz
-0.59
POSITIVE LOGITS
track
1.25
away
0.95
ped
0.94
tracking
0.79
tr
0.77
INTON
0.77
up
0.77
dash
0.77
drive
0.74
down
0.73
Activations Density 0.030%