INDEX
Explanations
instances of sarcasm or irony in language
New Auto-Interp
Negative Logits
ces
-0.16
itle
-0.16
apan
-0.15
mdat
-0.15
ppv
-0.15
itz
-0.14
ellan
-0.14
adesh
-0.14
unar
-0.14
oston
-0.14
POSITIVE LOGITS
dns
0.17
wis
0.16
dns
0.15
.rl
0.14
ηÏĤ
0.14
trail
0.14
esti
0.14
Č↵
0.14
dear
0.14
Trails
0.14
Activations Density 0.698%