INDEX
Explanations
words related to coercion or exerting pressure
instances of the word "force" and its variations
New Auto-Interp
Negative Logits
ahu
-0.73
ership
-0.69
Story
-0.68
NOW
-0.68
umer
-0.67
mbuds
-0.67
ergy
-0.67
lus
-0.66
aghd
-0.65
vironment
-0.65
POSITIVE LOGITS
overtime
0.81
otom
0.80
laborers
0.75
conversions
0.73
cible
0.71
conformity
0.70
concessions
0.67
maj
0.67
exerted
0.66
cooker
0.66
Activations Density 0.033%