INDEX
    Explanations

    disambiguation

    The neuron fires on parenthesized Wikipedia‐style disambiguation or type labels (e.g. “(disambiguation)”, “(surname)”, “(given name)”).

    New Auto-Interp
    Negative Logits
    -0.07
     ноября
    -0.07
    862
    -0.07
    -0.06
     Fant
    -0.06
    169
    -0.06
     Mich
    -0.06
     Stamina
    -0.06
     Ferr
    -0.06
    -0.06
    POSITIVE LOGITS
    DOWN
    0.06
     bli
    0.06
    ��드
    0.06
    원이
    0.06
    coordinate
    0.06
    _GT
    0.06
    -coordinate
    0.06
     lie
    0.06
    Fix
    0.05
    charm
    0.05
    Act Density 0.001%

    No Known Activations