INDEX
Explanations
phrases or sentences indicating sarcasm or jest
phrases related to teasing or joking
New Auto-Interp
Negative Logits
marked
-0.89
ŃĶ
-0.85
por
-0.75
marks
-0.72
porter
-0.70
gars
-0.68
actions
-0.67
Decre
-0.67
lining
-0.66
morph
-0.66
POSITIVE LOGITS
kidding
1.25
aside
0.88
joking
0.79
aloud
0.74
spared
0.69
hypoc
0.66
renheit
0.65
posit
0.64
Goodbye
0.62
jokes
0.62
Activations Density 0.007%