INDEX
    Explanations

    themes related to deception, crime, and subversion

    New Auto-Interp
    Negative Logits
    edBy
    -0.20
    itics
    -0.18
    اÙĪØ±ÛĮ
    -0.16
    iatrics
    -0.16
    iation
    -0.16
    lessness
    -0.16
    isation
    -0.15
    itu
    -0.15
    izons
    -0.15
    iliation
    -0.14
    POSITIVE LOGITS
    ulous
    0.25
    eous
    0.24
    ous
    0.23
    ive
    0.23
    orous
    0.22
    ful
    0.20
    ulent
    0.20
    inous
    0.20
    ocratic
    0.20
    itious
    0.20
    Act Density 0.140%

    No Known Activations