INDEX
Explanations
references to authority and governance
New Auto-Interp
Negative Logits
majority
-0.21
erts
-0.15
YRO
-0.15
linked
-0.15
anne
-0.15
Cort
-0.14
acial
-0.14
hoff
-0.14
Peters
-0.14
mers
-0.14
POSITIVE LOGITS
said
0.18
said
0.18
é«
0.16
iyon
0.15
ÙħÙĨÛĮ
0.15
forth
0.14
âĸ³
0.14
kdir
0.14
genu
0.14
agar
0.14
Activations Density 0.082%