INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ſind
    -0.97
     Verſ
    -0.96
     autorytatywna
    -0.91
    ſelves
    -0.89
     يتيمه
    -0.88
     Anſ
    -0.86
     Monfieur
    -0.84
     iſt
    -0.83
     disambiguazione
    -0.82
    AutoScaleMode
    -0.80
    POSITIVE LOGITS
    id
    1.05
     id
    0.81
    Id
    0.71
    ID
    0.68
     ID
    0.63
     Id
    0.59
    s
    0.59
    ,
    0.55
    0.54
    u
    0.54
    Act Density 0.010%

    No Known Activations