INDEX
Explanations
expressions of humor and personal enjoyment
New Auto-Interp
Negative Logits
IENT
-0.15
odyn
-0.14
Chronicle
-0.14
üre
-0.14
Kron
-0.14
aison
-0.14
iez
-0.14
alley
-0.14
/close
-0.14
erif
-0.14
POSITIVE LOGITS
oce
0.16
.hw
0.15
Ø·ÙĨ
0.14
↵↵
0.14
outu
0.14
orph
0.14
FREE
0.14
ız
0.13
Guth
0.13
/rfc
0.13
Activations Density 0.169%