INDEX
Explanations
instances of the word "one" and its variations in various contexts
New Auto-Interp
Negative Logits
abbo
-0.15
ithe
-0.15
ä¸ĺ
-0.14
à¸Ńà¸Ķ
-0.14
層
-0.14
usher
-0.14
laz
-0.14
OTE
-0.14
trand
-0.14
pite
-0.14
POSITIVE LOGITS
else
0.25
Nobody
0.19
except
0.19
ever
0.19
except
0.18
/no
0.17
else
0.16
aten
0.15
ël
0.15
ELSE
0.15
Activations Density 0.025%