INDEX
    Explanations

    instances of acknowledgment or confession

    New Auto-Interp
    Negative Logits
    olding
    -0.16
    /Gate
    -0.15
    blo
    -0.15
    lei
    -0.15
    olds
    -0.15
    abouts
    -0.14
    ettle
    -0.14
    ìĶ
    -0.14
    Configurer
    -0.14
    ¶Į
    -0.13
    POSITIVE LOGITS
     defeat
    0.29
     freely
    0.20
     responsibility
    0.19
     defeats
    0.19
    ting
    0.19
     feeling
    0.18
     guilt
    0.17
     defeated
    0.17
    ration
    0.16
     wrongdoing
    0.16
    Act Density 0.030%

    No Known Activations