INDEX
Explanations
names related to historical figures and places
references to historical figures, especially those with the name Caesar
New Auto-Interp
Negative Logits
lag
-1.00
ledge
-0.83
rien
-0.83
ritch
-0.80
intosh
-0.80
rike
-0.77
tip
-0.77
essional
-0.76
wer
-0.75
lying
-0.72
POSITIVE LOGITS
Caesar
1.29
esar
1.21
Augustus
1.11
Claud
1.10
olini
0.99
Nero
0.83
Brut
0.80
Julius
0.78
Clown
0.77
Gaul
0.72
Activations Density 0.016%