INDEX
Explanations
instances of the word "number" and variations thereof
New Auto-Interp
Negative Logits
olik
-0.19
Bever
-0.17
Numbers
-0.17
numbers
-0.16
IED
-0.16
Numbers
-0.16
numbers
-0.15
_numbers
-0.15
rag
-0.14
alez
-0.14
POSITIVE LOGITS
-one
0.26
ones
0.23
Ones
0.21
-One
0.21
-two
0.21
ones
0.20
One
0.20
ONES
0.20
two
0.20
ONE
0.20
Activations Density 0.017%