INDEX
Explanations
the letter 'y' in various contexts
New Auto-Interp
Negative Logits
o
-0.28
a
-0.26
u
-0.25
i
-0.23
r
-0.22
y
-0.21
t
-0.21
ay
-0.20
n
-0.20
an
-0.19
POSITIVE LOGITS
achts
0.27
ea
0.21
oke
0.21
ester
0.21
anked
0.19
ean
0.19
سطس
0.18
oked
0.18
onder
0.18
olk
0.17
Activations Density 0.018%