INDEX
Explanations
the word "pull" with strong activations
New Auto-Interp
Negative Logits
merce
-0.74
ibel
-0.74
voy
-0.73
icol
-0.73
esa
-0.72
orld
-0.70
mint
-0.69
heid
-0.69
SPONSORED
-0.67
Scotia
-0.66
POSITIVE LOGITS
levers
0.99
aggro
0.88
weeds
0.85
punches
0.83
pull
0.82
away
0.82
awed
0.80
awa
0.79
off
0.78
chairs
0.78
Activations Density 0.032%