INDEX
Explanations
humorous or funny content
expressions of humor, particularly those related to being funny or hilarious
New Auto-Interp
Negative Logits
Process
-0.81
Scale
-0.76
Scale
-0.70
process
-0.69
system
-0.69
osph
-0.69
atom
-0.68
Regions
-0.68
properties
-0.68
pri
-0.67
POSITIVE LOGITS
hilarious
3.25
hilar
2.41
humorous
2.37
amusing
2.27
comedic
2.03
funn
1.96
funny
1.88
witty
1.82
sarcastic
1.68
satirical
1.66
Activations Density 0.042%