INDEX
Explanations
phrases indicating joking or exaggeration
expressions indicating humor or sarcasm
New Auto-Interp
Negative Logits
CTR
-0.64
roots
-0.59
Bridges
-0.58
iliated
-0.58
manned
-0.56
reens
-0.56
icipated
-0.56
intersect
-0.56
scrim
-0.56
competed
-0.55
POSITIVE LOGITS
^^^^
0.86
_.
0.85
kidding
0.82
haha
0.79
;)
0.79
:-)
0.76
myself
0.75
:)
0.74
idge
0.73
here
0.72
Activations Density 0.403%