INDEX
Explanations
phrases related to humor or jokes
references to humor or comedic content
New Auto-Interp
Negative Logits
ainer
-1.02
eded
-0.82
enfranch
-0.82
eding
-0.78
ignt
-0.77
chwitz
-0.76
ilings
-0.75
ensional
-0.74
apers
-0.74
redit
-0.73
POSITIVE LOGITS
netflix
0.92
funny
0.82
GIF
0.79
Laugh
0.75
balls
0.75
ness
0.75
comedy
0.74
glers
0.72
banter
0.72
anecdotes
0.71
Activations Density 0.021%