INDEX
Explanations
expressions and variations of anger
New Auto-Interp
Negative Logits
'gc
-0.18
ož
-0.16
ãĥ¼ãĥĵ
-0.16
etik
-0.15
opsy
-0.15
oq
-0.15
weeney
-0.14
lại
-0.14
etri
-0.14
//**↵
-0.14
POSITIVE LOGITS
bang
0.16
him
0.16
har
0.16
_ioctl
0.15
emies
0.14
dd
0.14
vine
0.14
ÙĪÙĦا
0.14
İ
0.14
ym
0.13
Activations Density 0.027%