INDEX
Explanations
the presence of the letter 'a' in various contexts within the text
New Auto-Interp
Negative Logits
f
-0.23
g
-0.22
v
-0.22
d
-0.22
r
-0.21
y
-0.20
c
-0.18
ver
-0.18
h
-0.18
vier
-0.18
POSITIVE LOGITS
abb
0.30
,b
0.29
/b
0.25
+b
0.23
ustin
0.22
eron
0.21
href
0.21
-zA
0.20
>b
0.20
aVar
0.19
Activations Density 0.079%