INDEX
Explanations
references to nobility and titles, specifically "Duke" and related terms
New Auto-Interp
Negative Logits
Anand
-0.78
Hollis
-0.74
aéri
-0.71
aryen
-0.71
préfé
-0.69
Lydia
-0.69
Kenney
-0.69
Selen
-0.68
Kars
-0.67
Anand
-0.66
POSITIVE LOGITS
Duke
1.60
Duke
1.47
DUKE
1.30
Dukes
1.29
duke
1.19
Duque
0.96
Durham
0.95
Ellington
0.90
Duchess
0.87
duke
0.84
Activations Density 0.011%