INDEX
Explanations
mentions of humor or humorous content
New Auto-Interp
Negative Logits
649
-0.18
istrovstvÃŃ
-0.17
STALL
-0.16
FOUNDATION
-0.16
allen
-0.15
hips
-0.15
iner
-0.15
hrd
-0.14
lew
-0.14
ideo
-0.14
POSITIVE LOGITS
hum
0.25
Hum
0.24
Hum
0.21
pty
0.21
mers
0.21
oldt
0.19
hum
0.18
ankind
0.17
iliate
0.17
ricane
0.17
Activations Density 0.018%