INDEX
    Explanations

    The neuron detects occurrences of the token “worm” (in both singular and plural forms).

    New Auto-Interp
    Negative Logits
     fluffy
    -0.08
     intuit
    -0.08
    Senate
    -0.07
    、高
    -0.07
    etal
    -0.06
    Joe
    -0.06
     insanity
    -0.06
     кле
    -0.06
    _safe
    -0.06
     Symphony
    -0.06
    POSITIVE LOGITS
     worms
    0.13
     worm
    0.12
    worm
    0.11
     Worm
    0.10
    0.09
    -vesm
    0.09
    m
    0.08
    (cm
    0.08
    term
    0.07
    (tm
    0.07
    Act Density 0.002%

    No Known Activations