INDEX
    Explanations

    expressions of fondness or affection

    New Auto-Interp
    Negative Logits
    orgot
    -0.16
    angelo
    -0.16
    undo
    -0.15
    eer
    -0.15
    een
    -0.15
    uels
    -0.15
    andler
    -0.15
    uze
    -0.15
    iko
    -0.15
    eyer
    -0.14
    POSITIVE LOGITS
    amental
    0.30
    ue
    0.24
    amentals
    0.22
    ness
    0.18
    ksiyon
    0.16
     memories
    0.16
    ãģ¼
    0.15
    akk
    0.15
    hin
    0.15
    scal
    0.15
    Act Density 0.005%

    No Known Activations