INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    л
    1.13
    ל
    1.03
    					
    1.01
    1
    0.96
    z
    0.93
    ل
    0.89
     Arjun
    0.89
    `:
    0.88
    ש
    0.88
    ین
    0.84
    POSITIVE LOGITS
     Bay
    1.03
     bay
    1.03
    1.03
    Bay
    0.99
    การ
    0.95
     bays
    0.92
    ला
    0.91
    ról
    0.91
    0.91
    ले
    0.87
    Act Density 0.002%

    No Known Activations