INDEX
Explanations
occurrences of the letter 'H', often as part of proper nouns or titles
New Auto-Interp
Negative Logits
ello
-0.27
ex
-0.21
ere
-0.20
OST
-0.19
ER
-0.19
ave
-0.18
ERE
-0.18
er
-0.18
HF
-0.18
z
-0.18
POSITIVE LOGITS
r
0.27
ruby
0.20
s
0.20
rab
0.20
MAS
0.20
sing
0.19
rst
0.19
rish
0.19
rad
0.19
ORIZONTAL
0.19
Activations Density 0.099%