INDEX
    Explanations

    the prefix "un-" indicating negation or reversal of meaning

    New Auto-Interp
    Negative Logits
    rome
    -0.16
    à¤Ĭ
    -0.16
    à¤Ĥपर
    -0.15
    692
    -0.15
    éry
    -0.14
    falls
    -0.14
    unfinished
    -0.14
    rypted
    -0.14
    dech
    -0.14
    elho
    -0.14
    POSITIVE LOGITS
     Wind
    0.19
    Wind
    0.19
     wind
    0.19
    mask
    0.18
     winding
    0.18
    ear
    0.18
    wind
    0.17
    æīİ
    0.17
    hook
    0.17
    HOOK
    0.16
    Act Density 0.022%

    No Known Activations