INDEX
Explanations
references to the concept of "one."
New Auto-Interp
Negative Logits
ones
-0.25
one
-0.20
lant
-0.18
Ones
-0.18
rd
-0.18
mente
-0.17
land
-0.17
se
-0.17
nya
-0.17
th
-0.16
POSITIVE LOGITS
-third
0.31
onta
0.29
-way
0.26
-half
0.26
-dimensional
0.25
-sided
0.25
/t
0.24
particular
0.23
-two
0.23
-stop
0.23
Activations Density 0.163%