INDEX
Explanations
occurrences of the word "an" in various forms
New Auto-Interp
Negative Logits
cheid
-0.18
essel
-0.18
appen
-0.16
Mellon
-0.14
éļľ
-0.14
enced
-0.14
{*-0.14
Ñħови
-0.14
anders
-0.14
n
-0.14
POSITIVE LOGITS
erk
0.25
fang
0.22
pass
0.21
ony
0.21
onym
0.19
ze
0.18
sat
0.17
hang
0.17
son
0.17
oni
0.17
Activations Density 0.007%