INDEX
Explanations
words related to royal or prestigious titles
names and terms related to individuals and their roles
New Auto-Interp
Negative Logits
CF
-0.76
respir
-0.72
typh
-0.71
PLAY
-0.70
onz
-0.67
IL
-0.66
-0.65
fou
-0.64
ticking
-0.64
Bomber
-0.63
POSITIVE LOGITS
arna
4.20
Kara
2.28
ivan
1.39
aja
1.32
Sara
1.25
Exc
0.99
Tara
0.96
edom
0.90
assment
0.90
Kira
0.89
Activations Density 0.046%