INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ۹
    0.21
     når
    0.20
     těch
    0.20
     eğer
    0.20
    0.19
     आज
    0.19
     избежать
    0.19
    esseur
    0.19
    0.19
     někter
    0.19
    POSITIVE LOGITS
     separate
    0.27
     distinct
    0.25
     different
    0.24
    +
    0.23
    (!)
    0.21
     consecutive
    0.21
     (!)
    0.20
     contrasting
    0.19
     competing
    0.19
     standout
    0.19
    Act Density 0.237%

    No Known Activations