INDEX
Explanations
instances of the letter "h" appearing in various contexts
New Auto-Interp
Negative Logits
p
-0.22
d
-0.19
unga
-0.18
ip
-0.17
v
-0.17
z
-0.17
andre
-0.16
f
-0.16
op
-0.16
am
-0.15
POSITIVE LOGITS
pyl
0.18
alf
0.17
azy
0.17
ards
0.17
istr
0.16
ares
0.15
.Guna
0.15
osi
0.15
ibern
0.15
ugging
0.15
Activations Density 0.023%