INDEX
Explanations
words related to humor and jokes
references to humor and jokes
New Auto-Interp
Negative Logits
ignty
-0.84
ignt
-0.72
opio
-0.71
20439
-0.69
eus
-0.68
ports
-0.67
ainer
-0.67
arnaev
-0.66
consolid
-0.66
hips
-0.65
POSITIVE LOGITS
jokes
1.01
ously
0.97
humour
0.92
banter
0.92
comedian
0.90
joking
0.89
joke
0.88
netflix
0.87
mocking
0.86
humor
0.86
Activations Density 0.105%