INDEX
Explanations
instances of punctuation marks, particularly commas, in the text
New Auto-Interp
Negative Logits
aku
-0.21
egen
-0.17
urai
-0.16
estone
-0.15
_SUBJECT
-0.14
елÑĮзÑı
-0.14
Strat
-0.14
bold
-0.14
ÑıÑĩ
-0.14
uggy
-0.14
POSITIVE LOGITS
ollen
0.18
addon
0.16
essler
0.15
ancias
0.15
spr
0.15
Scaler
0.14
mdi
0.14
ap
0.13
638
0.13
ÐĴÐŀ
0.13
Activations Density 0.023%