INDEX
    Explanations

    words indicating suspicion or doubt

    New Auto-Interp
    Negative Logits
    iry
    -0.17
    lean
    -0.15
    .mods
    -0.15
    OTA
    -0.15
    mage
    -0.14
    ird
    -0.14
    otas
    -0.14
    ancy
    -0.14
    .motion
    -0.14
    veau
    -0.14
    POSITIVE LOGITS
     Sharp
    0.16
    itra
    0.16
    iale
    0.15
    éĹ
    0.15
    ovich
    0.15
    ITTE
    0.15
     Cole
    0.14
     Engel
    0.14
    ãĤ¥
    0.14
    enko
    0.14
    Act Density 0.002%

    No Known Activations