INDEX
Explanations
timing of events
The neuron chiefly detects negative or contrastive markers (e.g. “not,” “did,” “though,” “become” in negating contexts).
New Auto-Interp
Negative Logits
_EXISTS
-0.07
ール
-0.06
ocode
-0.06
---------------------------------------------------------------------------↵
-0.06
oftware
-0.06
سته
-0.06
�
-0.06
켓
-0.06
polít
-0.06
ymi
-0.06
POSITIVE LOGITS
(-
0.07
outsider
0.06
electrons
0.06
VOID
0.06
Jest
0.06
(icon
0.06
/task
0.06
+/
0.06
barcelona
0.06
(=
0.06
Activations Density 0.070%