INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0
    0.64
     ;
    0.60
     :
    0.58
    ک
    0.58
     ]
    0.57
    0.57
    7
    0.57
    ID
    0.55
     be
    0.55
     _.
    0.54
    POSITIVE LOGITS
    é
    0.63
    on
    0.61
    е
    0.61
    anın
    0.60
    oned
    0.59
    as
    0.58
    charged
    0.58
    ar
    0.57
    timed
    0.57
    en
    0.56
    Act Density 0.001%

    No Known Activations