INDEX
Explanations
terms related to intentions or goals
phrases expressing intentions or future actions
New Auto-Interp
Negative Logits
roy
-0.81
worth
-0.73
shown
-0.71
workers
-0.68
eros
-0.64
mask
-0.62
owners
-0.61
bur
-0.60
trust
-0.59
voc
-0.59
POSITIVE LOGITS
emulate
1.03
improve
0.92
avoid
0.90
broaden
0.90
resume
0.86
tighten
0.86
incorporate
0.85
maximize
0.84
eliminate
0.83
conserve
0.83
Activations Density 0.054%