INDEX
Explanations
instances where someone is being coerced or compelled to do something against their will
instances of coercion or being compelled to do something against one's will
New Auto-Interp
Negative Logits
ight
-0.77
ership
-0.73
Excellence
-0.73
ouf
-0.72
ergy
-0.70
itect
-0.69
umer
-0.66
NOW
-0.65
thus
-0.64
itz
-0.64
POSITIVE LOGITS
overtime
0.81
laborers
0.81
labou
0.80
pneum
0.80
forced
0.77
cible
0.75
ierrez
0.75
ãĥ¼ãĤ¯
0.72
choked
0.69
untarily
0.69
Activations Density 0.016%