INDEX
    Explanations

    The neuron fires on any occurrence of the substring “diff” (in any context or casing).

    New Auto-Interp
    Negative Logits
     Kral
    -0.07
    onal
    -0.07
     Monument
    -0.07
    utut
    -0.07
     Pon
    -0.07
     вну
    -0.07
    man
    -0.07
     зан
    -0.07
    ju
    -0.07
     Banner
    -0.07
    POSITIVE LOGITS
     diff
    0.13
    diff
    0.12
     DIFF
    0.11
    _diff
    0.10
     Diff
    0.10
    DIFF
    0.09
    Diff
    0.09
    (diff
    0.09
     diffs
    0.08
     diffuse
    0.08
    Act Density 0.009%

    No Known Activations