INDEX
Explanations
occurrences of the letter 'b'
New Auto-Interp
Negative Logits
r
-0.28
u
-0.23
l
-0.23
et
-0.23
lk
-0.23
an
-0.21
uD
-0.21
j
-0.21
id
-0.20
ul
-0.20
POSITIVE LOGITS
ellow
0.20
oston
0.19
oulder
0.19
idders
0.19
obby
0.18
ounces
0.18
rowning
0.18
rian
0.18
itters
0.17
uster
0.17
Activations Density 0.017%