INDEX
Explanations
instances of phrases with a disruptive or contrasting tone
instances of expressions of incredulity or surprise
New Auto-Interp
Negative Logits
utive
-0.77
reated
-0.75
encia
-0.70
ļéĨĴ
-0.70
Unique
-0.69
ahu
-0.68
irie
-0.68
orem
-0.66
edom
-0.65
axis
-0.63
POSITIVE LOGITS
huh
1.23
eh
1.17
though
1.14
but
1.13
although
1.11
especially
1.03
haha
1.00
albeit
0.98
however
0.91
frankly
0.91
Activations Density 0.346%