INDEX
Explanations
references to senior-level positions and titles in organizations
New Auto-Interp
Negative Logits
gi
-0.17
efeller
-0.16
pert
-0.15
ãĥ³ãĥĸ
-0.15
ting
-0.15
ionario
-0.14
γο
-0.14
enders
-0.14
away
-0.14
go
-0.14
POSITIVE LOGITS
ity
0.41
-most
0.36
citizens
0.25
-level
0.25
citizen
0.23
itis
0.23
Citizens
0.22
most
0.21
ities
0.21
level
0.19
Activations Density 0.019%