INDEX
Explanations
instances of jokes or humor in the text
New Auto-Interp
Negative Logits
¯
-0.15
@Id
-0.14
ÑĥÑĩ
-0.14
nda
-0.14
unner
-0.14
tract
-0.14
utz
-0.14
wahl
-0.14
enet
-0.13
нг
-0.13
POSITIVE LOGITS
sworth
0.16
sg
0.15
ä»ķ
0.15
mdi
0.14
.mount
0.14
odÃŃ
0.14
SG
0.14
turnstile
0.13
seins
0.13
ingly
0.13
Activations Density 0.020%