INDEX
    Explanations

    the word "Show" with varying activation strengths

    instances of the word "Show"

    New Auto-Interp
    Negative Logits
    OAD
    -0.72
    ADE
    -0.67
    adem
    -0.61
    bsite
    -0.59
     Seeking
    -0.59
     Bere
    -0.58
    ngth
    -0.58
    auri
    -0.57
    eco
    -0.57
     Âł
    -0.56
    POSITIVE LOGITS
     Thumbnails
    1.03
    alter
    0.74
    cases
    0.70
    case
    0.68
     me
    0.64
     nested
    0.63
    boat
    0.63
    kat
    0.61
    downs
    0.60
    boats
    0.59
    Act Density 0.028%

    No Known Activations