INDEX
Explanations
phrases that evoke strong emotional responses or reactions
New Auto-Interp
Negative Logits
ór
-0.16
iro
-0.15
akeup
-0.15
Lonely
-0.14
opath
-0.14
annoyed
-0.14
éĹ
-0.14
znam
-0.14
irritated
-0.14
bored
-0.14
POSITIVE LOGITS
speech
0.40
speech
0.35
Speech
0.34
aw
0.33
Speech
0.33
awe
0.31
gas
0.29
dumb
0.29
spell
0.29
jaw
0.29
Activations Density 0.272%