INDEX
Explanations
instances of refusal or resistance
instances of the word "refuse" and its variations, indicating a focus on refusal or rejection
New Auto-Interp
Negative Logits
hetti
-0.76
ochond
-0.75
è¦ļéĨĴ
-0.74
ICAN
-0.74
ammy
-0.73
APH
-0.71
Assembly
-0.71
estern
-0.70
groups
-0.70
bred
-0.68
POSITIVE LOGITS
geon
0.84
vehemently
0.76
quit
0.74
miser
0.73
afe
0.73
adm
0.72
refusal
0.71
admission
0.70
outright
0.69
refuse
0.68
Activations Density 0.021%