INDEX
Explanations
occurrences of the letter 's' in various contexts
New Auto-Interp
Negative Logits
athers
-0.17
un
-0.16
еÑĢ
-0.16
chos
-0.15
ets
-0.15
al
-0.15
il
-0.15
it
-0.15
c
-0.14
at
-0.14
POSITIVE LOGITS
iph
0.23
pon
0.22
yc
0.20
plot
0.20
ullen
0.20
ord
0.19
ulk
0.19
lob
0.19
ough
0.18
uss
0.18
Activations Density 0.012%