INDEX
    Explanations

    negative responses or phrases indicating disapproval

    New Auto-Interp
    Negative Logits
    lined
    -0.15
     æ©
    -0.15
    soever
    -0.14
    ató
    -0.14
    ported
    -0.14
    itzer
    -0.14
    /***/
    -0.14
    ycz
    -0.14
    .Atomic
    -0.14
    olik
    -0.14
    POSITIVE LOGITS
    strand
    0.21
    emi
    0.18
    sey
    0.18
    igroup
    0.18
    thern
    0.18
    xious
    0.18
     Holds
    0.17
    isy
    0.17
    Limits
    0.17
    ises
    0.17
    Act Density 0.045%

    No Known Activations