INDEX
Explanations
User input requests
The neuron fires on explicit mentions or requests to include “emoji”/“emojis” in the text.
New Auto-Interp
Negative Logits
cil
-0.06
milion
-0.06
STATES
-0.06
genocide
-0.06
-Version
-0.06
δό
-0.06
(equal
-0.06
776
-0.06
UP
-0.06
ีนาคม
-0.06
POSITIVE LOGITS
Cork
0.07
gebru
0.07
known
0.07
// ↵ ↵
0.06
ypsy
0.06
608
0.06
_BEGIN
0.06
textbook
0.06
_cycle
0.06
.bean
0.06
Activations Density 0.026%