INDEX
Explanations
This neuron activates primarily on placeholder “NAME_*” tokens (i.e. the anonymized entity name markers).
New Auto-Interp
Negative Logits
orderly
-0.08
_USER
-0.07
.boolean
-0.06
_REFRESH
-0.06
caffeine
-0.06
arrivals
-0.06
WLAN
-0.06
_fre
-0.06
Mondays
-0.06
(animation
-0.06
POSITIVE LOGITS
HTMLElement
0.07
香港
0.06
proj
0.06
nesení
0.06
Trio
0.06
userAgent
0.06
sonuç
0.06
Fuck
0.06
слово
0.06
izioni
0.06
Activations Density 0.115%