INDEX
Explanations
This neuron activates on the substring “trophy,” spotting any token that contains “trophy” (e.g., trophy, trophies, atrophy).
references to trophies.
New Auto-Interp
Negative Logits
setters
-0.08
Scene
-0.08
Current
-0.07
.Emit
-0.07
028
-0.07
856
-0.07
232
-0.07
Verizon
-0.07
beams
-0.07
Binder
-0.07
POSITIVE LOGITS
rophy
0.10
Trophy
0.10
trophies
0.09
trophy
0.09
roph
0.08
pro
0.07
лиш
0.07
Troll
0.07
생님
0.07
ΟΦ
0.06
Activations Density 0.004%