INDEX
    Explanations

    expressions of surprise or approval

    New Auto-Interp
    Negative Logits
    hd
    -0.16
    umbled
    -0.15
    akt
    -0.15
    ataloader
    -0.15
    ograd
    -0.15
     seekers
    -0.14
    ollapse
    -0.14
    ilder
    -0.14
    eed
    -0.14
    apat
    -0.14
    POSITIVE LOGITS
    oop
    0.19
    Inline
    0.17
    anden
    0.15
    ubi
    0.15
    ãĥ¼ãĥĵ
    0.14
    UGIN
    0.14
    ELY
    0.14
     Conc
    0.14
    inea
    0.14
    IEL
    0.14
    Act Density 0.151%

    No Known Activations