INDEX
Explanations
The neuron activates on occurrences of formal logical terminology—especially references to “definability” (implicit/explicit) and “first-order” (as in “first-order logic/theory”).
New Auto-Interp
Negative Logits
()],
-0.07
�
-0.06
zp
-0.06
dělen
-0.06
faces
-0.06
badges
-0.06
LIN
-0.06
Id
-0.06
_proxy
-0.06
уст
-0.06
POSITIVE LOGITS
EINA
0.07
continual
0.07
(Stream
0.07
>window
0.06
zeitig
0.06
.createTextNode
0.06
_decl
0.06
roads
0.06
_RESOLUTION
0.06
oooooooo
0.06
Activations Density 0.006%