INDEX
Explanations
instances of the letter 'z' in various contexts
New Auto-Interp
Negative Logits
u
-0.21
z
-0.21
h
-0.20
ar
-0.20
w
-0.20
n
-0.20
v
-0.18
im
-0.18
b
-0.18
ap
-0.17
POSITIVE LOGITS
odiac
0.28
ebra
0.22
onal
0.20
ircon
0.20
zz
0.18
oned
0.18
ipped
0.18
ipline
0.17
ucchini
0.17
witter
0.17
Activations Density 0.011%