INDEX
    Explanations

    variations of the word "original."

    New Auto-Interp
    Negative Logits
     mere
    -0.17
    lej
    -0.16
    esh
    -0.15
    ocs
    -0.15
    ings
    -0.15
    owel
    -0.15
    usal
    -0.15
    mere
    -0.14
    ãģĬãĤĬ
    -0.14
    Ïĥί
    -0.14
    POSITIVE LOGITS
    ity
    0.37
    /original
    0.28
    mente
    0.26
    ITY
    0.21
    ised
    0.20
    ities
    0.18
    atively
    0.18
    -language
    0.17
    isation
    0.17
    y
    0.17
    Act Density 0.025%

    No Known Activations