INDEX
Explanations
This neuron activates on the word “soul,” detecting uses of that token.
New Auto-Interp
Negative Logits
_AdjustorThunk
-0.07
onders
-0.07
.BackgroundImageLayout
-0.07
Araştır
-0.06
kaufen
-0.06
Caucasian
-0.06
pec
-0.06
방법
-0.06
ucus
-0.06
abolition
-0.06
POSITIVE LOGITS
маг
0.07
resend
0.07
/max
0.06
hack
0.06
Comple
0.06
maint
0.06
schl
0.06
Cancellation
0.06
Disconnect
0.06
Km
0.06
Activations Density 0.002%