INDEX
Explanations
phrases related to willingness to do something
expressions of willingness or commitment to action
New Auto-Interp
Negative Logits
hemy
-0.88
Anthem
-0.83
adish
-0.76
loo
-0.76
oche
-0.74
alien
-0.73
arette
-0.73
onut
-0.73
rx
-0.72
riot
-0.68
POSITIVE LOGITS
theless
0.81
willing
0.79
gladly
0.77
enough
0.76
terday
0.75
willingly
0.73
unres
0.72
uncond
0.69
to
0.68
accept
0.67
Activations Density 0.031%