INDEX
Explanations
sense of humor-related words and phrases
references to humor in various contexts
New Auto-Interp
Negative Logits
ignty
-0.75
holder
-0.74
arnaev
-0.65
holders
-0.61
ports
-0.61
FT
-0.60
opio
-0.60
EVA
-0.60
eded
-0.58
oug
-0.57
POSITIVE LOGITS
ously
1.21
netflix
0.95
humour
0.83
humor
0.82
osity
0.80
isma
0.80
aceous
0.77
mocking
0.76
atur
0.74
fulness
0.73
Activations Density 0.026%