INDEX
Explanations
references to individuals in senior positions or roles
New Auto-Interp
Negative Logits
pert
-0.16
away
-0.15
tures
-0.15
ALER
-0.15
gi
-0.15
ionario
-0.15
γο
-0.15
ting
-0.15
entials
-0.15
ewater
-0.14
POSITIVE LOGITS
ity
0.40
-most
0.33
-level
0.24
citizens
0.23
itis
0.22
citizen
0.22
级
0.20
Citizens
0.20
most
0.20
ities
0.19
Activations Density 0.020%