INDEX
Explanations
instances of the word "one" followed by a number (or by words representing numbers)
the repeated mention of the word "one" in various contexts
New Auto-Interp
Negative Logits
osponsors
-0.78
inders
-0.74
ooks
-0.73
grounds
-0.73
ories
-0.71
lations
-0.68
sav
-0.66
idences
-0.66
emies
-0.64
esta
-0.64
POSITIVE LOGITS
hundred
0.93
thing
0.89
Hundred
0.89
Shot
0.86
heck
0.79
thousand
0.78
overriding
0.75
overarching
0.74
sided
0.72
apiece
0.72
Activations Density 0.119%