INDEX
Explanations
dates and numerical values expressed in text
numerical references, specifically dates and significant numbers
New Auto-Interp
Negative Logits
behavi
-0.65
theless
-0.63
ogyn
-0.62
targ
-0.61
alike
-0.60
abwe
-0.60
Rica
-0.59
derog
-0.58
iku
-0.58
warfare
-0.56
POSITIVE LOGITS
th
1.33
teenth
1.12
venth
1.03
rd
0.99
nd
0.97
%-
0.92
%"
0.88
ieth
0.87
richest
0.79
ottest
0.79
Activations Density 0.123%