INDEX
Explanations
words related to historical figures, especially with titles like "Pope" or "Emperor."
New Auto-Interp
Negative Logits
izable
-0.96
izational
-0.87
ished
-0.86
ding
-0.85
ized
-0.83
ishment
-0.78
izers
-0.76
agements
-0.76
ishing
-0.76
izations
-0.76
POSITIVE LOGITS
Caesar
1.07
XM
0.92
XII
0.88
Malfoy
0.84
Pil
0.84
Severus
0.80
cles
0.80
aurus
0.79
hip
0.78
Claud
0.78
Activations Density 0.043%