INDEX
    Explanations

    from the activations shown, it seems like this neuron is looking for words ending with "-um"

    occurrences of the token "um."

    New Auto-Interp
    Negative Logits
     cutoff
    -0.72
     strawberries
    -0.69
    âĸĪâĸĪâĸĪâĸĪâĸĪâĸĪâĸĪâĸĪ
    -0.67
     Aires
    -0.65
     elves
    -0.65
    Ń·
    -0.64
     Shades
    -0.63
     blackout
    -0.62
    jri
    -0.62
     Morales
    -0.62
    POSITIVE LOGITS
    osity
    1.09
    mers
    1.09
    ming
    1.06
    essage
    0.98
    etric
    0.98
    atism
    0.97
    mit
    0.97
    um
    0.97
    pty
    0.96
    antic
    0.96
    Act Density 0.015%

    No Known Activations