INDEX
Explanations
negative phrases and expressions of refusal or rejection
New Auto-Interp
Negative Logits
effects
-0.68
unknown
-0.65
rawled
-0.65
Must
-0.64
DragonMagazine
-0.64
landscapes
-0.63
anni
-0.63
redients
-0.62
VERTISEMENT
-0.62
Stories
-0.61
POSITIVE LOGITS
bother
0.94
hesitate
0.93
condone
0.93
tolerate
0.90
underestimate
0.85
pretend
0.83
endorse
0.80
regret
0.80
quarrel
0.79
anymore
0.78
Activations Density 0.069%