INDEX
Explanations
the number "Two."
instances of the word "Two."
New Auto-Interp
Negative Logits
catalog
-0.74
reserves
-0.68
greatly
-0.64
actively
-0.63
often
-0.62
even
-0.62
ACT
-0.62
psy
-0.62
liber
-0.61
territory
-0.61
POSITIVE LOGITS
Two
3.10
Three
2.44
Four
2.31
Five
2.06
two
2.03
Eight
2.03
Two
2.02
Six
1.97
Seven
1.91
Nine
1.76
Activations Density 0.019%