INDEX
    Explanations

    initially steadily increased

    New Auto-Interp
    Negative Logits
     misses
    0.72
    从来
    0.71
    ผม
    0.71
    ەکە
    0.71
     MFC
    0.70
     نفسها
    0.69
     differs
    0.69
    ارج
    0.68
     لوبوي
    0.68
     الوقت
    0.68
    POSITIVE LOGITS
    garten
    0.85
    0.79
    ɧ
    0.77
    arono
    0.75
    0.75
     chiam
    0.75
    льта
    0.74
    ர்
    0.73
    гә
    0.72
    ting
    0.71
    Act Density 0.001%

    No Known Activations