INDEX
Explanations
quotations or reported speech
punctuation marks, particularly those used in dialogue
New Auto-Interp
Negative Logits
Ĥª
-0.90
ramid
-0.79
artifacts
-0.73
rats
-0.73
exempt
-0.72
ļéĨĴ
-0.71
opian
-0.71
PP
-0.71
¬¼
-0.70
ĻĤ
-0.70
POSITIVE LOGITS
exclaimed
1.24
replied
1.22
muttered
1.21
whispered
1.20
says
1.13
shouted
1.13
reads
1.12
yelled
1.09
cried
1.08
remarked
1.08
Activations Density 0.112%