INDEX
Explanations
The neuron activates on the character sequence “rior,” i.e. it detects the “rior” suffix as in “Warrior.”
New Auto-Interp
Negative Logits
Shows
-0.07
bö
-0.07
Luke
-0.07
iname
-0.06
53
-0.06
розрах
-0.06
input
-0.06
calendar
-0.06
spontaneous
-0.06
STOCK
-0.06
POSITIVE LOGITS
Warrior
0.13
Warriors
0.11
warrior
0.11
warriors
0.10
disrespectful
0.08
successors
0.08
guerr
0.08
servi
0.07
عاش
0.07
addslashes
0.07
Activations Density 0.003%