INDEX
Explanations
references to quantities or totals followed by numerical values
phrases that quantify or specify numerical totals
New Auto-Interp
Negative Logits
tera
-0.70
pring
-0.63
nesses
-0.63
Presence
-0.62
ness
-0.56
pod
-0.56
era
-0.55
condem
-0.55
ity
-0.54
Males
-0.53
POSITIVE LOGITS
eight
0.99
six
0.94
seven
0.93
thirteen
0.93
nine
0.91
THREE
0.89
fourteen
0.88
nineteen
0.86
four
0.86
eighteen
0.85
Activations Density 0.046%