INDEX
Explanations
occurrences of the letter 'b'
New Auto-Interp
Negative Logits
ern
-0.20
oles
-0.19
ooth
-0.17
oft
-0.17
erah
-0.17
oi
-0.17
oo
-0.17
iment
-0.17
l
-0.17
odyn
-0.17
POSITIVE LOGITS
æk
0.19
leshoot
0.19
em
0.18
bery
0.17
eg
0.17
ourn
0.17
otton
0.17
hai
0.17
ef
0.16
eyond
0.16
Activations Density 0.061%