INDEX
Explanations
instances of punctuation, particularly periods, in the text
New Auto-Interp
Negative Logits
anh
-0.16
Ñıб
-0.16
iah
-0.15
iece
-0.14
antu
-0.14
ãĥªãĤ«
-0.14
inds
-0.14
avatel
-0.14
thereby
-0.14
ub
-0.14
POSITIVE LOGITS
ittel
0.15
licity
0.15
orth
0.13
UZ
0.13
Beard
0.13
Siz
0.13
lien
0.13
/generated
0.13
861
0.13
zk
0.13
Activations Density 0.004%