INDEX
Explanations
references to royalty or royal titles
king and queen
New Auto-Interp
Negative Logits
bl
-0.33
…
-0.28
.*")]
-0.27
pate
-0.27
blij
-0.26
bl
-0.25
θεια
-0.25
,
-0.25
rep
-0.25
po
-0.24
POSITIVE LOGITS
Personendaten
0.82
king
0.75
queen
0.73
queen
0.73
Queen
0.73
QUEEN
0.72
kings
0.72
KING
0.71
Queen
0.70
KINGDOM
0.70
Activations Density 0.042%