INDEX
Explanations
expressions of willingness or readiness to take action
New Auto-Interp
Negative Logits
uum
-0.17
ernel
-0.17
avez
-0.15
chez
-0.15
AssemblyCopyright
-0.15
eve
-0.14
zeÅĦ
-0.14
ichick
-0.14
reserved
-0.14
ocker
-0.14
POSITIVE LOGITS
ness
0.29
willing
0.26
enough
0.21
able
0.20
/un
0.20
sacrifice
0.20
participant
0.20
ough
0.19
ful
0.19
ings
0.19
Activations Density 0.019%