INDEX
Explanations
references to jokes or humorous statements
mentions of jokes or humorous references
New Auto-Interp
Negative Logits
phalt
-0.83
undai
-0.82
porting
-0.72
enture
-0.72
CLASSIFIED
-0.69
rive
-0.68
ignty
-0.66
ighting
-0.66
orneys
-0.66
ports
-0.66
POSITIVE LOGITS
ously
0.99
jokes
0.91
joking
0.86
joke
0.86
bags
0.81
bag
0.79
osal
0.79
caller
0.77
Pom
0.77
mocking
0.76
Activations Density 0.019%