INDEX
Explanations
instances of the word "twenty" or variations referencing the number twenty
New Auto-Interp
Negative Logits
Hundred
-0.18
hundred
-0.17
Å¡ÃŃ
-0.15
Ïĥει
-0.15
uzzer
-0.15
abi
-0.15
/apt
-0.15
Joy
-0.15
uw
-0.14
±Ð¾ÑĤ
-0.14
POSITIVE LOGITS
-one
0.26
-two
0.26
-five
0.25
-first
0.24
-One
0.23
-three
0.23
-four
0.22
-nine
0.21
odd
0.21
-eight
0.20
Activations Density 0.031%