INDEX
    Explanations

    concept followed by specification

    New Auto-Interp
    Negative Logits
    ع
    1.90
    ح
    1.82
    1.58
     ומ
    1.55
    ق
    1.43
    的国家
    1.39
    ছে
    1.38
    ج
    1.33
    мо
    1.31
    1.30
    POSITIVE LOGITS
    el
    1.69
    ional
    1.66
    们的
    1.55
     sonucu
    1.52
    জনক
    1.43
    යෙන්
    1.43
    okhlov
    1.42
    wie
    1.39
    ्हे
    1.39
    ibility
    1.38
    Act Density 0.007%

    No Known Activations