INDEX
Explanations
occurrences of the word "step" and its variations in numerical contexts
New Auto-Interp
Negative Logits
lid
-0.17
Äįet
-0.16
ุà¸Ķ
-0.16
uges
-0.15
lig
-0.15
er
-0.15
stüt
-0.15
park
-0.15
olk
-0.15
estar
-0.14
POSITIVE LOGITS
wise
0.24
éª
0.24
-by
0.22
enson
0.22
han
0.21
pe
0.21
dad
0.20
pling
0.19
mother
0.19
hen
0.18
Activations Density 0.026%