INDEX
Explanations
enjoyment
This neuron activates on positive evaluative words (like “loved,” “enjoyed,” “amazing,” etc.) indicating praise or enthusiasm.
New Auto-Interp
Negative Logits
quoted
-0.08
puppy
-0.07
rainy
-0.07
ươi
-0.06
Mov
-0.06
ease
-0.06
_assignment
-0.06
WARDS
-0.06
WAIT
-0.06
playoffs
-0.06
POSITIVE LOGITS
Funds
0.07
enjoyed
0.07
469
0.06
dislike
0.06
decentral
0.06
еди
0.06
(err
0.06
wand
0.06
atherine
0.06
243
0.06
Activations Density 0.014%