INDEX
Explanations
instances of the letter "A" in various contexts
New Auto-Interp
Negative Logits
le
-0.23
na
-0.22
lo
-0.20
mi
-0.20
ling
-0.20
la
-0.20
li
-0.19
ct
-0.19
lie
-0.18
g
-0.18
POSITIVE LOGITS
eid
0.21
erif
0.17
equip
0.17
equ
0.17
eil
0.16
propos
0.16
eview
0.16
equal
0.16
ej
0.16
Crack
0.16
Activations Density 0.166%