INDEX
Explanations
references to royalty or imperial figures
references to emperors
New Auto-Interp
Negative Logits
ties
-0.73
crew
-0.72
ter
-0.71
ritch
-0.71
asket
-0.67
CAST
-0.67
solid
-0.67
ledge
-0.67
esville
-0.65
gew
-0.64
POSITIVE LOGITS
peror
1.10
Nero
1.09
perors
1.05
Claud
0.92
Napoleon
0.91
Emperor
0.85
emperor
0.85
Augustus
0.83
pengu
0.83
esses
0.82
Activations Density 0.012%