INDEX
Explanations
phrases related to high-ranking individuals, particularly in the context of official positions such as government or intelligence agencies
New Auto-Interp
Negative Logits
igor
-0.98
apesh
-0.94
AUT
-0.92
tits
-0.88
REG
-0.88
Ĥİ
-0.87
AST
-0.85
roma
-0.85
iky
-0.84
thirsty
-0.83
POSITIVE LOGITS
doms
1.21
iate
1.17
ity
1.15
citiz
1.11
itiz
1.10
eton
1.07
iating
1.05
most
1.03
iates
1.02
itor
0.95
Activations Density 0.973%