INDEX
Explanations
phrases related to holding onto beliefs, power, responsibilities, or physical objects
New Auto-Interp
Negative Logits
————
-0.70
DIT
-0.66
nown
-0.66
ãĤ¡
-0.65
ixel
-0.64
gnu
-0.64
ibel
-0.64
ghan
-0.64
ettel
-0.64
endix
-0.63
POSITIVE LOGITS
hold
1.09
sway
1.04
erness
0.98
accountable
0.97
holding
0.97
holders
0.95
hostage
0.92
captive
0.88
holder
0.88
onto
0.87
Activations Density 0.491%