INDEX
Explanations
By analyzing the activations, this neuron seems to be looking for mentions of unintended or additional outcomes related to actions or events
mentions of "product" and its variations
New Auto-Interp
Negative Logits
Hornets
-0.62
ITED
-0.62
Warriors
-0.61
Panther
-0.61
pup
-0.61
mosqu
-0.60
ARP
-0.60
©¶æ
-0.60
coil
-0.59
bee
-0.58
POSITIVE LOGITS
ively
1.19
iveness
1.18
ivity
1.16
ions
1.03
ivities
0.98
ivist
0.91
arian
0.89
ivism
0.88
urer
0.84
itute
0.81
Activations Density 0.024%