INDEX
Explanations
references to senior roles or positions within various contexts
New Auto-Interp
Negative Logits
gi
-0.17
away
-0.16
ting
-0.15
pane
-0.15
tures
-0.15
efeller
-0.15
ionario
-0.15
γο
-0.15
aways
-0.14
entials
-0.14
POSITIVE LOGITS
ity
0.40
-most
0.36
-level
0.24
citizens
0.24
itis
0.23
citizen
0.23
most
0.21
ities
0.21
Citizens
0.21
vice
0.20
Activations Density 0.019%