INDEX
Explanations
phrases indicating humor or sarcasm
expressions of disbelief or statements that someone is joking
New Auto-Interp
Negative Logits
marked
-0.84
ŃĶ
-0.83
pora
-0.76
part
-0.74
ugal
-0.72
por
-0.69
marks
-0.67
WIND
-0.67
enfranch
-0.66
namese
-0.65
POSITIVE LOGITS
kidding
1.04
renheit
0.76
joking
0.71
isSpecialOrderable
0.69
Niet
0.66
aloud
0.66
ayers
0.64
NESS
0.64
WARN
0.63
sters
0.63
Activations Density 0.024%