INDEX
Explanations
phrases related to taking action or making decisions
comparative phrases and similes
New Auto-Interp
Negative Logits
resy
-0.74
emi
-0.67
ennes
-0.67
aced
-0.66
ims
-0.66
ells
-0.65
iership
-0.65
oir
-0.64
rovers
-0.63
WI
-0.63
POSITIVE LOGITS
lihood
1.37
lier
0.84
removing
0.80
those
0.80
liest
0.79
skipping
0.79
deleting
0.79
altering
0.76
dropping
0.75
stealing
0.73
Activations Density 0.086%