INDEX
Explanations
occurrences of the letter 'h' in various contexts
New Auto-Interp
Negative Logits
Cré
-0.73
gé
-0.71
eto
-0.70
convo
-0.68
aval
-0.68
appro
-0.67
acer
-0.67
selo
-0.64
enne
-0.64
etten
-0.64
POSITIVE LOGITS
h
1.41
H
1.28
h
1.19
H
1.17
setH
1.09
hhh
1.06
hh
1.04
rH
1.04
mh
1.02
rh
1.02
Activations Density 0.190%