INDEX
Explanations
phrases related to political controversies and opinions
New Auto-Interp
Negative Logits
emi
-0.74
Times
-0.71
zar
-0.69
ibble
-0.67
shift
-0.67
hart
-0.66
busters
-0.66
dog
-0.65
agon
-0.65
du
-0.65
POSITIVE LOGITS
furthermore
1.37
consequently
1.34
moreover
1.26
therefore
1.21
secondly
1.16
hence
1.16
thus
1.15
thence
1.12
preferably
1.10
optionally
1.05
Activations Density 0.830%