INDEX
Explanations
phrases related to achieving success or successfully completing a task
instances of the word "succeed" and its variations
New Auto-Interp
Negative Logits
othy
-0.65
pper
-0.64
pora
-0.64
towels
-0.62
phone
-0.62
view
-0.62
rug
-0.62
reserved
-0.61
metal
-0.61
seed
-0.61
POSITIVE LOGITS
TAIN
0.82
anke
0.77
miser
0.76
ceed
0.76
iage
0.75
awaru
0.73
academ
0.72
rupal
0.72
ential
0.72
uates
0.71
Activations Density 0.019%