INDEX
Explanations
instances of the letter "W" in various contexts
New Auto-Interp
Negative Logits
idget
-0.19
allet
-0.17
arrow
-0.17
eb
-0.17
ave
-0.16
heel
-0.15
iki
-0.15
alls
-0.15
ork
-0.15
uges
-0.15
POSITIVE LOGITS
enh
0.15
WISE
0.15
анÑĮ
0.14
ictor
0.14
orgen
0.13
ho
0.13
iert
0.13
simul
0.13
anel
0.13
pros
0.13
Activations Density 0.039%