INDEX
    Explanations

    The neuron activates on explanations or definitions of “token” (and related terms like subword, unit of text, tokenization).

    New Auto-Interp
    Negative Logits
     Pols
    -0.08
    чис
    -0.07
     hypoth
    -0.06
    _ter
    -0.06
    Hel
    -0.06
     सत
    -0.06
    anax
    -0.06
    वत
    -0.06
     XY
    -0.06
    ův
    -0.06
    POSITIVE LOGITS
     selector
    0.07
    0.06
    [(
    0.06
     перемен
    0.06
     indemn
    0.06
     <!
    0.06
    .gsub
    0.06
    PerPixel
    0.06
     WideString
    0.06
    配置
    0.06
    Act Density 0.007%

    No Known Activations