INDEX
    Explanations

    words associated with negation and absence

    New Auto-Interp
    Negative Logits
    hani
    -0.15
     Prem
    -0.15
    ify
    -0.15
    man
    -0.14
    -font
    -0.14
    erral
    -0.14
    orman
    -0.14
    zman
    -0.14
     ÑģиÑĢ
    -0.14
    Äĥn
    -0.14
    POSITIVE LOGITS
    PE
    0.17
    peg
    0.17
    orsi
    0.16
     Kat
    0.16
    pe
    0.16
    peak
    0.16
     PE
    0.16
    pees
    0.16
    adder
    0.16
    _PE
    0.16
    Act Density 0.033%

    No Known Activations