INDEX
Negative Logits
Laugh
-0.15
mockery
-0.14
mocked
-0.12
amused
-0.11
mocking
-0.10
Junk
-0.10
æĥij
-0.10
goog
-0.09
laughing
-0.09
ridicule
-0.09
POSITIVE LOGITS
kidding
0.29
joke
0.28
pun
0.23
rib
0.23
jokes
0.23
jest
0.23
facet
0.19
cracking
0.18
wis
0.18
pun
0.18
Activations Density 0.225%