INDEX
Explanations
phrases that emphasize the ability to take action and influence outcomes
New Auto-Interp
Negative Logits
thro
-0.15
Controllers
-0.15
assen
-0.14
559
-0.14
aina
-0.14
stro
-0.14
oyn
-0.14
ncpy
-0.14
unas
-0.14
ader
-0.14
POSITIVE LOGITS
tasks
0.23
acts
0.20
activities
0.20
deeds
0.20
unto
0.19
omba
0.19
Tasks
0.19
_activities
0.18
tasks
0.17
things
0.17
Activations Density 0.098%