INDEX
Explanations
occurrences of the letter 'w' in various contexts
New Auto-Interp
Negative Logits
andest
-0.15
jom
-0.15
wort
-0.15
quee
-0.15
arb
-0.14
ition
-0.14
Düz
-0.14
nomine
-0.14
acen
-0.13
Lane
-0.13
POSITIVE LOGITS
w
0.23
illo
0.15
gre
0.15
[w
0.15
gle
0.15
ingly
0.14
rans
0.14
ubern
0.14
ering
0.14
bell
0.14
Activations Density 0.022%