INDEX
Explanations
references to governance and authority figures
New Auto-Interp
Negative Logits
delet
-0.77
oops
-0.69
aughtered
-0.69
rumors
-0.62
eatures
-0.60
rontal
-0.60
rollment
-0.60
rumor
-0.59
issan
-0.58
venient
-0.58
POSITIVE LOGITS
ankind
0.81
ateurs
0.77
outweigh
0.75
whilst
0.73
cients
0.69
versus
0.68
paramount
0.66
ĸļ
0.66
ãĢĤ
0.65
amidst
0.63
Activations Density 0.315%