INDEX
Explanations
The neuron is selectively activated by placeholder‐style entity tokens (e.g. NAME_, AUFIDIUS, etc.) rather than ordinary words.
New Auto-Interp
Negative Logits
raid
-0.07
]")↵
-0.07
;">↵
-0.07
려요
-0.07
}"↵
-0.07
ーの
-0.07
];↵
-0.07
pronto
-0.06
II
-0.06
golf
-0.06
POSITIVE LOGITS
水平
0.07
=G
0.07
thalm
0.06
[selected
0.06
edy
0.06
Marco
0.06
-master
0.06
сильно
0.06
_sur
0.06
onCreateViewHolder
0.06
Activations Density 0.008%