INDEX
    Explanations

    negations or expressions of opposition

    New Auto-Interp
    Negative Logits
    amoto
    -0.17
    ipel
    -0.16
    ertz
    -0.16
    ecute
    -0.16
    amps
    -0.15
    educt
    -0.15
    irus
    -0.15
     addObject
    -0.14
    htag
    -0.14
    ernen
    -0.14
    POSITIVE LOGITS
    ebek
    0.16
     porr
    0.15
    ëĿ
    0.15
     ones
    0.15
     mine
    0.15
    Ñĩив
    0.14
    Ú©ÛĮ
    0.14
    ----------------------------------------------------------------------↵
    0.14
    ovit
    0.14
     necessarily
    0.13
    Act Density 0.052%

    No Known Activations