INDEX
    Explanations

    negations or words indicating the absence or the inverse of something

    New Auto-Interp
    Negative Logits
    487
    -0.15
     REM
    -0.15
     rozs
    -0.14
    cn
    -0.14
    hen
    -0.14
    cin
    -0.13
     fuss
    -0.13
    inth
    -0.13
    curity
    -0.13
    484
    -0.13
    POSITIVE LOGITS
    ël
    0.17
    zell
    0.16
    Hallo
    0.15
    ÑĩаÑģÑĤ
    0.14
    ãģĵãĤĵ
    0.14
    pps
    0.14
    Away
    0.14
    Minor
    0.14
    ISR
    0.14
     Weston
    0.14
    Act Density 0.037%

    No Known Activations