INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    1.73
    el
    1.70
    ת
    1.65
    1.61
    an
    1.55
    ing
    1.54
    י
    1.51
    on
    1.48
    ı
    1.48
    ا
    1.44
    POSITIVE LOGITS
     buffs
    1.04
    ?
    0.99
     ی
    0.98
     radiographs
    0.96
     своїх
    0.92
    ;
    0.92
     дня
    0.90
     αν
    0.88
     κι
    0.88
     protégé
    0.88
    Act Density 0.030%

    No Known Activations