INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    plx
    -0.07
     turret
    -0.06
    (/^
    -0.06
    ْف
    -0.06
     autobiography
    -0.06
    _MANAGER
    -0.06
     tqdm
    -0.06
    _lowercase
    -0.06
    forEach
    -0.06
     exhibitions
    -0.06
    POSITIVE LOGITS
    >'↵
    0.07
     mek
    0.07
    .Expressions
    0.07
    мест
    0.06
     перс
    0.06
    530
    0.06
    MDB
    0.06
    Sig
    0.06
    струмент
    0.06
    .spark
    0.06
    Act Density 0.006%

    No Known Activations