INDEX
    Explanations

    references to the concept of deception or lies

    New Auto-Interp
    Negative Logits
    oki
    -0.17
    stown
    -0.15
    olian
    -0.15
    oded
    -0.15
    idan
    -0.14
    OME
    -0.14
    odable
    -0.14
    iei
    -0.14
    zyst
    -0.14
    ãĥIJãĥ¼
    -0.14
    POSITIVE LOGITS
    uten
    0.26
    utenant
    0.23
    berman
    0.23
    chten
    0.20
    ê´Ģ
    0.19
    urance
    0.16
    eg
    0.16
    gth
    0.16
    apis
    0.15
    istrovstvÃŃ
    0.15
    Act Density 0.015%

    No Known Activations