INDEX
Explanations
phrases indicating dissatisfaction or disapproval
instances of the word "complain" and its variations
New Auto-Interp
Negative Logits
assisted
-0.72
Manson
-0.69
ãĥ¯
-0.66
step
-0.65
ãĥĥãĤ¯
-0.64
poral
-0.64
bered
-0.62
lay
-0.61
STEP
-0.60
ccording
-0.60
POSITIVE LOGITS
bitterly
1.01
loudly
0.94
complaints
0.84
complains
0.82
complaining
0.81
isance
0.78
naires
0.77
complain
0.75
complained
0.74
aloud
0.74
Activations Density 0.021%