INDEX
Negative Logits
ets
-0.70
beaut
-0.69
paces
-0.68
rients
-0.67
»Ĵ
-0.67
safely
-0.66
tuned
-0.64
located
-0.64
pixel
-0.64
Featured
-0.64
POSITIVE LOGITS
refusal
3.29
unwillingness
2.38
reluctance
2.21
insistence
2.19
inability
2.14
rejection
1.97
willingness
1.93
failure
1.78
denial
1.76
dismissal
1.64
Activations Density 0.033%