INDEX
Explanations
phrases indicating denial or refusal
negations or phrases indicating denial or lack of action
New Auto-Interp
Negative Logits
LIFE
-0.72
Beaut
-0.71
ortality
-0.65
beautifully
-0.64
ngth
-0.64
ersed
-0.64
SPACE
-0.64
badass
-0.63
Survival
-0.63
Kinnikuman
-0.62
POSITIVE LOGITS
condone
1.06
prejud
0.90
comment
0.87
regret
0.86
prejudice
0.82
speculate
0.82
tolerate
0.82
commenting
0.79
jeopard
0.78
interfere
0.77
Activations Density 0.237%