INDEX
    Explanations

    themes related to hypocrisy and contradictions in beliefs versus actions

    New Auto-Interp
    Negative Logits
    YLES
    -0.17
    usted
    -0.16
    ovie
    -0.15
     seedu
    -0.15
    irit
    -0.15
    pla
    -0.15
     fond
    -0.14
     strip
    -0.14
    isque
    -0.14
    exampleModal
    -0.14
    POSITIVE LOGITS
     something
    0.20
    something
    0.20
    Something
    0.17
     Something
    0.17
    (thing
    0.15
     nÄĽco
    0.15
     excellence
    0.15
    omething
    0.15
    .Iter
    0.14
     ÅŁeyi
    0.14
    Act Density 0.229%

    No Known Activations