INDEX
Explanations
instances of humor and light-heartedness
New Auto-Interp
Negative Logits
inous
-0.17
uries
-0.17
.scalablytyped
-0.17
uri
-0.15
Fucking
-0.14
/**↵↵
-0.14
raped
-0.14
fucking
-0.14
Compensation
-0.13
indow
-0.13
POSITIVE LOGITS
humor
0.44
humour
0.40
jokes
0.36
laughs
0.36
comedy
0.35
laughter
0.35
humorous
0.34
joke
0.34
laugh
0.33
comedic
0.32
Activations Density 0.974%