INDEX
    Explanations

    The neuron detects in‐text citation markers (the bracketed reference labels).

    New Auto-Interp
    Negative Logits
     unintended
    -0.07
    Thousands
    -0.07
    -0.06
     موجب
    -0.06
     خی
    -0.06
    -Ass
    -0.06
    -million
    -0.06
    _tuples
    -0.06
     stri
    -0.06
     етап
    -0.06
    POSITIVE LOGITS
     personalize
    0.07
    NULL
    0.07
    --)
    0.07
     medidas
    0.06
     znění
    0.06
     Approval
    0.06
    (class
    0.06
    (step
    0.06
    =back
    0.06
     nr
    0.06
    Act Density 0.021%

    No Known Activations