INDEX
Explanations
references to jokes
occurrences of the word "jokes."
New Auto-Interp
Negative Logits
ignty
-0.69
CLASSIFIED
-0.67
ioch
-0.63
violet
-0.62
Borders
-0.61
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.60
orneys
-0.60
itizen
-0.58
JUST
-0.58
ignt
-0.57
POSITIVE LOGITS
jokes
1.02
linger
0.81
ters
0.80
joking
0.78
ster
0.78
banter
0.78
sters
0.76
ong
0.75
osal
0.74
itone
0.74
Activations Density 0.010%