INDEX
    Explanations

    words expressing a lack or absence, often associated with negativity or superfluousness

    New Auto-Interp
    Negative Logits
    ìĦľ
    -0.20
    zelf
    -0.18
    ization
    -0.18
    ity
    -0.17
    ÑģÑĮ
    -0.17
    _UNUSED
    -0.16
    ãĥ¼
    -0.16
    ISED
    -0.16
    avir
    -0.16
    ever
    -0.15
    POSITIVE LOGITS
    ness
    0.42
    nes
    0.38
    NESS
    0.30
    lessly
    0.24
    /un
    0.23
    ened
    0.23
    ening
    0.22
    ingly
    0.21
    es
    0.21
     wonder
    0.21
    Act Density 0.048%

    No Known Activations