INDEX
Explanations
gressive
The neuron activates on the word “aggressive” (and its close morphological variants like “aggression”).
New Auto-Interp
Negative Logits
atalog
-0.07
analý
-0.07
intertwined
-0.06
Land
-0.06
Sala
-0.06
Wyn
-0.06
ThreadPool
-0.06
theoret
-0.06
Tan
-0.06
Land
-0.06
POSITIVE LOGITS
aggressive
0.15
aggressively
0.12
aggression
0.11
aggress
0.10
지고
0.08
meg
0.08
بش
0.07
angst
0.07
џџ
0.07
j
0.07
Activations Density 0.005%