INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2.30
    ло
    1.91
    е
    1.82
    ול
    1.79
    ле
    1.65
     invariance
    1.58
    آ
    1.54
     isotype
    1.53
    то
    1.52
    ак
    1.52
    POSITIVE LOGITS
    ları
    1.86
    此同时
    1.84
    Greet
    1.84
    pada
    1.77
    Kalau
    1.77
    1.77
     Agustus
    1.73
    Artikel
    1.73
    pere
    1.73
    larını
    1.71
    Act Density 0.172%

    No Known Activations