INDEX
Explanations
the letter "W" and its occurrences in various contexts
New Auto-Interp
Negative Logits
allet
-0.21
hang
-0.17
ins
-0.17
all
-0.17
arrow
-0.16
widely
-0.16
atcher
-0.16
ie
-0.16
are
-0.16
as
-0.15
POSITIVE LOGITS
etter
0.18
anj
0.18
istar
0.18
tower
0.18
bsite
0.17
tf
0.17
atan
0.17
è
0.16
yr
0.16
roc
0.16
Activations Density 0.088%