INDEX
Explanations
assassination
The neuron activates on words related to assassination (variants of “assassinate,” “assassination,” etc.).
New Auto-Interp
Negative Logits
Thor
-0.07
(RE
-0.07
behaved
-0.06
Time
-0.06
movement
-0.06
(col
-0.06
thought
-0.06
plu
-0.06
lore
-0.06
forcefully
-0.06
POSITIVE LOGITS
assassin
0.12
Assassin
0.10
assass
0.10
assassination
0.10
assin
0.10
Assass
0.10
าะห
0.08
刺
0.07
Attribute
0.07
axon
0.07
Activations Density 0.003%