INDEX
Explanations
statements related to political issues and international relations
New Auto-Interp
Negative Logits
lifes
-0.84
nightly
-0.75
hero
-0.73
sucker
-0.71
stray
-0.70
bear
-0.69
instinct
-0.69
crush
-0.69
dolphin
-0.68
nomine
-0.67
POSITIVE LOGITS
Finally
1.76
Regarding
1.72
Lastly
1.70
Furthermore
1.69
Moreover
1.69
However
1.68
Ultimately
1.66
Additionally
1.63
Nevertheless
1.61
Similarly
1.59
Activations Density 0.476%