INDEX
Explanations
capital letters, particularly those at the beginning of sentences or proper nouns
New Auto-Interp
Negative Logits
ara
-0.16
aco
-0.16
seealso
-0.15
antee
-0.15
algo
-0.15
usra
-0.15
uele
-0.15
TAG
-0.14
ales
-0.14
TZ
-0.14
POSITIVE LOGITS
zf
0.17
jah
0.16
ecedor
0.16
-navbar
0.16
zing
0.16
osi
0.15
etre
0.15
zed
0.15
zi
0.15
ews
0.15
Activations Density 0.043%