INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ى
    1.81
    вання
    1.74
    1.74
    ливо
    1.61
    1.60
    ран
    1.57
    ئة
    1.57
    кона
    1.55
    ка
    1.52
    кала
    1.52
    POSITIVE LOGITS
    g
    1.92
    en
    1.78
    ن
    1.73
    an
    1.63
    n
    1.63
    ang
    1.59
     Cp
    1.57
    ana
    1.55
     Obj
    1.53
     zetten
    1.53
    Act Density 0.072%

    No Known Activations