INDEX
    Explanations

    negative contractions, particularly "not" and its variations

    New Auto-Interp
    Negative Logits
    adem
    -0.17
    گر
    -0.15
    мага
    -0.14
    hop
    -0.14
    unga
    -0.14
    uit
    -0.14
    \Mapping
    -0.13
    shouldBe
    -0.13
    alth
    -0.13
    äl
    -0.13
    POSITIVE LOGITS
     know
    0.24
     care
    0.23
     necessarily
    0.22
     have
    0.19
     Know
    0.19
    know
    0.19
     even
    0.19
     mind
    0.18
    Know
    0.18
     quite
    0.17
    Act Density 0.195%

    No Known Activations