INDEX
Explanations
instances of the word "said"
New Auto-Interp
Negative Logits
theless
-0.73
ãĥİ
-0.73
pend
-0.70
resent
-0.69
paralle
-0.68
thur
-0.67
conflic
-0.67
MU
-0.67
pend
-0.65
earable
-0.63
POSITIVE LOGITS
sarcast
0.95
bluntly
0.90
rhet
0.88
referring
0.85
emphatically
0.78
aloud
0.77
afterward
0.75
diplom
0.74
incred
0.74
proudly
0.72
Activations Density 0.116%