INDEX
Explanations
phrases that include the word "said"
instances of attribution or quotations in speech
New Auto-Interp
Negative Logits
ãĥİ
-0.65
OIL
-0.64
ãĥĥãĥī
-0.63
pend
-0.61
found
-0.57
ãĥĥãĥĪ
-0.56
ãĥł
-0.56
ãĥij
-0.53
Operation
-0.53
ãĥª
-0.52
POSITIVE LOGITS
afterward
0.86
bluntly
0.85
sarcast
0.85
.
0.83
rhet
0.82
emphatically
0.76
softly
0.76
incred
0.75
of
0.75
laughing
0.72
Activations Density 0.112%