INDEX
Explanations
statements related to international relations and politics
New Auto-Interp
Negative Logits
crappy
-0.75
nightly
-0.69
sucker
-0.67
Isles
-0.65
legend
-0.64
unlucky
-0.64
cousins
-0.63
thrill
-0.63
everyday
-0.62
fancy
-0.62
POSITIVE LOGITS
Regarding
1.33
Lastly
1.32
Refer
1.29
Furthermore
1.29
Secondly
1.21
Finally
1.19
Moreover
1.19
Additionally
1.17
CONCLUS
1.16
Therefore
1.09
Activations Density 0.928%