INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    unused
    -0.08
     conveyed
    -0.08
     مربع
    -0.07
    Unused
    -0.07
    ும்
    -0.07
     GEL
    -0.07
    ks
    -0.07
     świat
    -0.07
    Akt
    -0.07
    mol
    -0.07
    POSITIVE LOGITS
     n't
    0.09
    0.08
    .COMP
    0.08
     wajen
    0.08
    вания
    0.08
     Hib
    0.07
     качестве
    0.07
     TEN
    0.07
     conjunction
    0.07
    Maps
    0.07
    Act Density 0.051%

    No Known Activations