INDEX
Explanations
references to political figures and government positions in various contexts
New Auto-Interp
Negative Logits
colonel
-0.52
doctor
-0.50
king
-0.47
queen
-0.46
doctor
-0.46
mr
-0.45
先生
-0.44
prince
-0.44
miss
-0.42
sultan
-0.42
POSITIVE LOGITS
Acting
1.20
Assistant
1.02
Acting
0.97
Deputy
0.96
Vice
0.86
Interim
0.84
Secretary
0.84
Associate
0.83
acting
0.81
Chief
0.80
Activations Density 0.567%