INDEX
Explanations
instances of the word "two" or its numeric representation
New Auto-Interp
Negative Logits
bast
-0.16
imer
-0.15
ones
-0.14
erm
-0.14
imers
-0.14
laus
-0.14
stå
-0.14
ertas
-0.13
hausen
-0.13
umer
-0.13
POSITIVE LOGITS
-thirds
0.29
-dimensional
0.25
dozen
0.24
gether
0.23
ième
0.22
/th
0.21
nd
0.20
-way
0.19
-fold
0.18
-sided
0.17
Activations Density 0.120%