INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ные
    1.66
    ный
    1.52
    ا
    1.34
    ни
    1.27
    ки
    1.20
    ня
    1.20
    ون
    1.18
    ة
    1.15
    وس
    1.12
    ı
    1.11
    POSITIVE LOGITS
     net
    1.44
     Net
    1.09
    ने
    1.00
     nets
    0.99
    š
    0.98
     NET
    0.97
    Net
    0.96
    するため
    0.95
     at
    0.94
    NET
    0.92
    Act Density 0.025%

    No Known Activations