INDEX
Explanations
phrases related to completing a task or reaching a goal
instances of achievement or completion
New Auto-Interp
Negative Logits
ials
-0.64
Slate
-0.63
fluct
-0.62
Spread
-0.62
constantly
-0.61
Rue
-0.60
conom
-0.58
flat
-0.58
Scrib
-0.58
alone
-0.56
POSITIVE LOGITS
relent
0.75
resil
0.74
atown
0.71
onso
0.70
iflower
0.69
aterasu
0.68
reckoning
0.68
ppo
0.67
vana
0.66
ileaks
0.65
Activations Density 0.155%