INDEX
Explanations
names of royalty or historical figures
titles and ranks associated with nobility and historical figures
New Auto-Interp
Negative Logits
paces
-0.96
roups
-0.82
layoffs
-0.79
groups
-0.79
Apps
-0.79
salads
-0.77
tones
-0.77
poons
-0.76
rooms
-0.76
clusters
-0.75
POSITIVE LOGITS
ruler
1.59
emperor
1.56
Emperor
1.54
commander
1.47
dictator
1.45
Empress
1.38
tyrant
1.35
Ruler
1.34
Commander
1.30
governor
1.30
Activations Density 0.341%