INDEX
    Explanations

    various forms of the prefix "dis" or words related to negative or undesirable outcomes

    New Auto-Interp
    Negative Logits
    hind
    -0.18
    iley
    -0.17
    ong
    -0.15
    kad
    -0.15
    scriber
    -0.15
    jet
    -0.15
    kap
    -0.14
    het
    -0.14
    uples
    -0.14
    azon
    -0.14
    POSITIVE LOGITS
    ellaneous
    0.19
    rael
    0.19
    naire
    0.17
    ¼
    0.16
    gow
    0.15
    emean
    0.15
    ment
    0.14
    ettes
    0.14
    /dis
    0.14
    keit
    0.14
    Act Density 0.083%

    No Known Activations