INDEX
Explanations
the word "joke" followed by either the number 9 or 10
the concept of "joke" in various contexts
New Auto-Interp
Negative Logits
undai
-0.71
phalt
-0.67
ignty
-0.67
ignt
-0.66
EVA
-0.64
actions
-0.64
enture
-0.63
ills
-0.63
CLASSIFIED
-0.63
arnaev
-0.62
POSITIVE LOGITS
ously
1.14
jokes
0.93
osal
0.83
joking
0.82
mocking
0.81
bag
0.79
sters
0.79
writer
0.78
joke
0.77
bags
0.76
Activations Density 0.049%