INDEX
Explanations
terms and phrases indicating nobility or aristocracy
New Auto-Interp
Negative Logits
icked
-0.18
acen
-0.17
agh
-0.15
Fritz
-0.15
eldon
-0.15
egis
-0.15
ocab
-0.15
cı
-0.15
apper
-0.15
igma
-0.14
POSITIVE LOGITS
ility
0.27
les
0.27
lemen
0.26
odies
0.25
LES
0.21
bler
0.20
ilities
0.20
iliary
0.19
bery
0.18
ILITY
0.18
Activations Density 0.007%