INDEX
Explanations
expressions of frustration or dissatisfaction
New Auto-Interp
Negative Logits
stav
-0.15
ofilm
-0.15
oker
-0.14
κά
-0.14
ippi
-0.14
orted
-0.14
åķª
-0.14
åłĤ
-0.14
Wow
-0.13
hei
-0.13
POSITIVE LOGITS
ARG
0.30
ble
0.29
U
0.25
ugh
0.25
ARG
0.25
ACK
0.25
sigh
0.24
u
0.24
ur
0.23
Bah
0.23
Activations Density 0.198%