INDEX
Explanations
political or authoritative terms, often relating to power or control
occurrences of the word "reign" and its variations
New Auto-Interp
Negative Logits
ertodd
-0.79
Quotes
-0.69
hammad
-0.68
FFER
-0.66
Friendly
-0.62
Humanity
-0.61
----------------
-0.60
contrace
-0.59
Spiel
-0.58
WER
-0.57
POSITIVE LOGITS
pin
0.85
ited
0.84
ieved
0.78
unders
0.77
uin
0.75
der
0.74
s
0.74
esses
0.74
oct
0.73
iever
0.71
Activations Density 0.011%