INDEX
    Explanations

    phrases indicating agreement and compliance with rules or terms

    New Auto-Interp
    Negative Logits
    ardy
    -0.19
    cape
    -0.16
    ife
    -0.15
    äm
    -0.15
    .IntPtr
    -0.15
    addin
    -0.15
     manners
    -0.15
    maid
    -0.14
    .sponge
    -0.14
    åİļ
    -0.13
    POSITIVE LOGITS
    åļ
    0.14
    åĴ
    0.14
    PTS
    0.14
    eya
    0.14
    istogram
    0.14
    riot
    0.14
     statement
    0.14
    elage
    0.13
    conds
    0.13
    entionPolicy
    0.13
    Act Density 0.156%

    No Known Activations