INDEX
Explanations
actions or statements related to persuasion or coercion
phrases related to motivation and decision-making
New Auto-Interp
Negative Logits
enery
-0.69
Paste
-0.63
Logo
-0.63
availability
-0.62
edia
-0.61
alde
-0.60
names
-0.59
assador
-0.59
etimes
-0.58
overed
-0.58
POSITIVE LOGITS
risky
1.02
harder
0.96
undesirable
0.92
harsher
0.86
aggressive
0.84
certain
0.83
deeper
0.81
immoral
0.80
unfavorable
0.78
higher
0.77
Activations Density 0.403%