INDEX
Explanations
instances of the letter 'a' in various contexts
New Auto-Interp
Negative Logits
est
-0.38
que
-0.25
pt
-0.24
bl
-0.24
ffects
-0.24
im
-0.24
esthetic
-0.24
ims
-0.23
äºĽ
-0.23
ffect
-0.23
POSITIVE LOGITS
ustral
0.26
lic
0.22
ustr
0.21
then
0.19
riel
0.18
tras
0.18
eo
0.18
ustralian
0.18
ther
0.18
ero
0.18
Activations Density 0.063%