INDEX
Explanations
references to comedy
references to comedy in various contexts
New Auto-Interp
Negative Logits
ipped
-0.80
igree
-0.76
eded
-0.73
oral
-0.71
hips
-0.70
eding
-0.68
brid
-0.67
irts
-0.66
abet
-0.66
ignty
-0.65
POSITIVE LOGITS
comedy
1.02
theatre
0.90
comed
0.87
Comedy
0.84
comedian
0.83
Comed
0.83
Schumer
0.82
theater
0.82
sketches
0.80
improvis
0.77
Activations Density 0.012%