INDEX
Explanations
countries and organizations related to international politics
statements related to the existence or status of entities and actions in various contexts
New Auto-Interp
Negative Logits
IDENT
-0.64
ãĥ´
-0.60
ieves
-0.59
Salary
-0.58
iversary
-0.58
ãĥĹ
-0.55
Pers
-0.55
IMAGES
-0.55
ipers
-0.55
Rew
-0.55
POSITIVE LOGITS
likewise
1.51
meanwhile
1.36
similarly
1.34
also
1.14
unaffected
1.07
another
1.06
another
1.04
Another
0.93
additionally
0.93
also
0.90
Activations Density 0.530%