INDEX
Explanations
statements or quotes spoken by someone
instances of dialogue or statements made by individuals
New Auto-Interp
Negative Logits
ãĥİ
-0.72
thur
-0.65
ãĥĥãĥĪ
-0.65
paralle
-0.63
MU
-0.62
arent
-0.62
pend
-0.60
å§«
-0.59
ILCS
-0.59
resent
-0.59
POSITIVE LOGITS
sarcast
1.06
bluntly
1.02
rhet
0.96
emphatically
0.89
referring
0.83
afterward
0.81
diplom
0.78
aloud
0.78
incred
0.77
proudly
0.75
Activations Density 0.123%