INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     paljon
    1.06
     kao
    1.02
     maksimum
    1.00
     sitten
    0.98
    theless
    0.93
     saja
    0.93
    0.91
    0.89
     asam
    0.88
    ipped
    0.87
    POSITIVE LOGITS
    ar
    0.94
    s
    0.86
    ش
    0.80
    ق
    0.79
    ار
    0.77
    ра
    0.77
    ah
    0.77
    un
    0.75
    sberg
    0.71
    та
    0.69
    Act Density 0.001%

    No Known Activations