INDEX
Explanations
words related to authority, control, and power
phrases related to conflict, danger, and political issues
New Auto-Interp
Negative Logits
ggles
-0.66
partName
-0.60
ratulations
-0.57
pires
-0.57
)\
-0.55
)!
-0.54
guyen
-0.54
>)
-0.53
yours
-0.51
bernatorial
-0.51
POSITIVE LOGITS
because
1.11
because
0.96
"...
0.88
owing
0.87
".
0.86
whereas
0.86
.
0.84
despite
0.83
but
0.83
"â̦
0.83
Activations Density 1.120%