INDEX
Explanations
instances of jokes and humor
New Auto-Interp
Negative Logits
rrggbb
-0.56
MBR
-0.49
MBR
-0.49
Whitby
-0.47
SSR
-0.47
CBI
-0.46
CRS
-0.46
PMI
-0.46
McMaster
-0.45
SBR
-0.45
POSITIVE LOGITS
jokes
1.84
joke
1.84
joke
1.66
Joke
1.63
Jokes
1.59
jokes
1.57
Jokes
1.56
Joke
1.52
joking
1.38
joked
1.31
Activations Density 0.002%