INDEX
Explanations
inability to access
This neuron activates on words expressing inability or lack of access (e.g., “cannot,” “can’t,” “unable,” “access,” “view,” “see,” “edit”).
New Auto-Interp
Negative Logits
PERFORMANCE
-0.06
ected
-0.06
048
-0.06
aldığı
-0.06
房间
-0.06
RCS
-0.06
linguistic
-0.05
懂
-0.05
glorious
-0.05
Rig
-0.05
POSITIVE LOGITS
ToSelector
0.08
lessen
0.07
MouseEvent
0.07
ata
0.07
ープ
0.07
AMERA
0.07
pthread
0.07
UMMY
0.07
uppy
0.07
の人
0.07
Activations Density 0.023%