INDEX
    Explanations

    phrases indicating negation or refusal

    finds 'not' followed by verbs

    New Auto-Interp
    Negative Logits
    tanleria
    -0.52
    endregion
    -0.47
     <<<<<<<<<<<<<<
    -0.47
    ніципа
    -0.44
    KommentareTeilen
    -0.40
    sizeCache
    -0.40
    プーン
    -0.40
     Grundlage
    -0.40
    purl
    -0.39
     powierzchni
    -0.39
    POSITIVE LOGITS
     للمعارف
    0.57
     Cæsar
    0.52
     ſein
    0.52
     itſelf
    0.49
    jsxFileName
    0.48
     springfox
    0.48
     ſont
    0.47
     ſei
    0.47
     ſich
    0.47
     esti
    0.46
    Act Density 0.062%

    No Known Activations