INDEX
Explanations
expressions of pleading or requests for help
New Auto-Interp
Negative Logits
hausen
-0.16
edef
-0.16
eyh
-0.16
_fx
-0.15
oulos
-0.14
Sunder
-0.14
anki
-0.14
atts
-0.14
evin
-0.14
elon
-0.14
POSITIVE LOGITS
beg
0.19
beg
0.18
Permission
0.18
permission
0.17
Beg
0.17
gary
0.17
mercy
0.17
release
0.16
ging
0.16
begging
0.16
Activations Density 0.030%