INDEX
Explanations
occurrences of the letter 'B'
New Auto-Interp
Negative Logits
ullet
-0.20
ог
-0.20
TEGER
-0.18
rowser
-0.18
uy
-0.18
ourg
-0.17
ank
-0.17
rowse
-0.16
ulk
-0.16
lok
-0.16
POSITIVE LOGITS
ionic
0.19
em
0.18
sides
0.18
ix
0.17
antan
0.17
-side
0.17
itters
0.17
ong
0.16
IX
0.16
ef
0.16
Activations Density 0.044%