INDEX
Explanations
the word "willing" in the text
expressions that indicate a willingness or readiness to take action
New Auto-Interp
Negative Logits
gran
-0.78
Anthem
-0.72
Pt
-0.69
lav
-0.69
ORGE
-0.68
alien
-0.68
cell
-0.67
adish
-0.66
ogg
-0.66
gar
-0.66
POSITIVE LOGITS
willing
1.22
unwilling
1.05
theless
0.88
incent
0.84
willingly
0.82
onlook
0.78
unres
0.76
gladly
0.76
compe
0.75
bilt
0.74
Activations Density 0.014%