INDEX
    Explanations

    The neuron activates on occurrences of formal logical terminology—especially references to “definability” (implicit/explicit) and “first-order” (as in “first-order logic/theory”).

    New Auto-Interp
    Negative Logits
    ()],
    -0.07
    -0.06
    zp
    -0.06
    dělen
    -0.06
     faces
    -0.06
     badges
    -0.06
     LIN
    -0.06
     Id
    -0.06
    _proxy
    -0.06
     уст
    -0.06
    POSITIVE LOGITS
     EINA
    0.07
     continual
    0.07
    (Stream
    0.07
    >window
    0.06
    zeitig
    0.06
    .createTextNode
    0.06
    _decl
    0.06
    roads
    0.06
    _RESOLUTION
    0.06
    oooooooo
    0.06
    Act Density 0.006%

    No Known Activations