INDEX
Explanations
negations and phrases indicating resistance or refusal
New Auto-Interp
Negative Logits
ndon
-0.15
-avatar
-0.14
assis
-0.14
nowhere
-0.14
ัà¸ģà¸Ĺ
-0.14
599
-0.14
fend
-0.14
ì¹Ļ
-0.13
actually
-0.13
rios
-0.13
POSITIVE LOGITS
sugar
0.24
waiver
0.24
settling
0.21
settle
0.20
Sugar
0.20
Sugar
0.20
gloss
0.19
rest
0.19
coast
0.19
succ
0.18
Activations Density 0.248%