INDEX
Explanations
phrases related to humor or something not to be taken seriously
references to jokes or humor
New Auto-Interp
Negative Logits
undai
-0.87
phalt
-0.77
ignt
-0.71
ignty
-0.71
enture
-0.69
orneys
-0.68
rity
-0.66
porting
-0.66
arnaev
-0.66
CLASSIFIED
-0.66
POSITIVE LOGITS
ously
1.04
jokes
0.87
mocking
0.84
joke
0.81
fest
0.78
joking
0.78
osal
0.78
bags
0.77
bag
0.76
Pom
0.76
Activations Density 0.026%