INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    י
    0.92
    0.89
    verständlich
    0.89
     يناير
    0.86
    Те
    0.86
    yards
    0.85
    Κα
    0.85
    ي
    0.85
    İN
    0.84
    Вы
    0.83
    POSITIVE LOGITS
    er
    1.15
    ل
    0.92
    р
    0.91
    н
    0.89
    ка
    0.87
    ر
    0.86
    0.82
    х
    0.82
    ,
    0.82
    artige
    0.82
    Act Density 0.037%

    No Known Activations