INDEX
Explanations
instances of the letter "y"
New Auto-Interp
Negative Logits
lemen
-0.17
leton
-0.17
eldorf
-0.17
leted
-0.16
ained
-0.16
ioned
-0.15
adece
-0.15
lete
-0.15
encion
-0.15
leston
-0.15
POSITIVE LOGITS
achts
0.28
oke
0.24
acht
0.23
ếu
0.22
ea
0.21
onder
0.20
ields
0.20
IELD
0.20
ymm
0.20
uforia
0.19
Activations Density 0.042%