INDEX
Explanations
references to royalty and their titles
New Auto-Interp
Negative Logits
compri
-0.72
itſelf
-0.68
Lovato
-0.68
antaine
-0.66
Dars
-0.66
evan
-0.66
pertory
-0.65
eradish
-0.65
Prakash
-0.64
Travis
-0.64
POSITIVE LOGITS
Kings
1.58
kings
1.49
KING
1.48
King
1.47
king
1.33
Kings
1.33
KINGS
1.26
King
1.24
👑
1.17
Queen
1.11
Activations Density 0.106%