INDEX
Explanations
proper nouns related to quotes or dialogue
occurrences of dialogue or statements made by characters
New Auto-Interp
Negative Logits
ãĥİ
-0.71
pend
-0.64
arent
-0.62
ãĥĥãĥī
-0.60
found
-0.59
MU
-0.58
Operation
-0.55
OIL
-0.54
Pont
-0.53
theless
-0.53
POSITIVE LOGITS
sarcast
0.97
bluntly
0.93
rhet
0.92
emphatically
0.84
afterward
0.83
softly
0.79
incred
0.79
.
0.78
referring
0.77
confidently
0.77
Activations Density 0.102%