INDEX
Explanations
references to completing a task or reaching a goal
the word "finish" and its variations, which indicates a focus on completion and endings
New Auto-Interp
Negative Logits
amount
-0.76
reported
-0.75
elled
-0.73
models
-0.73
orb
-0.72
eros
-0.71
edia
-0.71
ria
-0.68
add
-0.68
bour
-0.67
POSITIVE LOGITS
finish
1.14
Finish
1.09
finishes
0.96
finishing
0.89
smanship
0.76
unfinished
0.75
Finish
0.75
Finished
0.74
isans
0.72
elim
0.71
Activations Density 0.008%