INDEX
Explanations
phrases related to ongoing activities or changes
phrases related to actions of people in positions of authority
New Auto-Interp
Negative Logits
predecessor
-0.54
TION
-0.54
isSpecialOrderable
-0.52
Kills
-0.52
Brazilian
-0.51
presidency
-0.50
Nato
-0.49
Width
-0.48
ç«
-0.48
was
-0.48
POSITIVE LOGITS
themselves
1.26
selves
0.96
their
0.83
their
0.83
THEIR
0.82
theirs
0.81
yourselves
0.77
Their
0.70
joice
0.67
Their
0.67
Activations Density 1.159%