INDEX
Explanations
references to historical empires and territorial control
New Auto-Interp
Negative Logits
etta
-0.15
xbc
-0.14
attached
-0.14
jac
-0.14
AXB
-0.14
laus
-0.13
aina
-0.13
Independ
-0.13
jon
-0.13
ace
-0.13
POSITIVE LOGITS
Emp
0.72
Emp
0.70
emp
0.68
Empire
0.65
empire
0.60
emp
0.56
Imper
0.55
.emp
0.54
imper
0.54
_emp
0.54
Activations Density 0.157%