INDEX
Explanations
instances of the word "one" associated with numerical values, establishing importance or comparison
New Auto-Interp
Negative Logits
ooks
-0.82
hips
-0.77
ories
-0.75
inders
-0.73
respective
-0.69
osponsors
-0.68
ourn
-0.63
emies
-0.62
folk
-0.62
actions
-0.61
POSITIVE LOGITS
hundred
1.19
Hundred
1.10
thousand
0.99
thing
0.92
sided
0.90
dimensional
0.87
month
0.83
Piece
0.83
Thousand
0.83
Shot
0.81
Activations Density 1.813%