INDEX
Explanations
words related to rejection or refusal
terms related to rejection and disapproval
New Auto-Interp
Negative Logits
isd
-0.71
isf
-0.71
vern
-0.71
ipel
-0.71
ortal
-0.71
ussen
-0.70
encyclopedia
-0.70
vity
-0.68
amen
-0.67
brance
-0.67
POSITIVE LOGITS
outright
0.82
rejection
0.77
validation
0.69
kus
0.68
acceptance
0.67
âķIJâķIJ
0.67
excuses
0.67
temptation
0.66
Ĥª
0.66
straw
0.65
Activations Density 0.032%