INDEX
    Explanations

    distribution

    The neuron fires on occurrences of the word “distribution” (as found in license‐header comment blocks).

    New Auto-Interp
    Negative Logits
    。他
    -0.07
     dar
    -0.06
     Lazar
    -0.06
     iki
    -0.06
    acağız
    -0.06
     lái
    -0.06
     Aerospace
    -0.06
     tasar
    -0.06
     ceny
    -0.06
     Gordon
    -0.06
    POSITIVE LOGITS
    arpa
    0.07
    rib
    0.07
     currentTime
    0.07
     댓글
    0.07
     日本
    0.06
    ιθ
    0.06
    mpl
    0.06
    /in
    0.06
    _preview
    0.06
     хозяй
    0.06
    Act Density 0.001%

    No Known Activations