INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Fakat
    1.80
    y
    1.77
    nými
    1.72
    s
    1.59
     وع
    1.59
    ्ञ
    1.57
    ่า
    1.50
    the
    1.48
    g
    1.48
    1.47
    POSITIVE LOGITS
    ت
    1.80
    1.77
    ע
    1.74
    ри
    1.73
    :
    1.73
    কে
    1.72
    iv
    1.71
    ва
    1.71
    ра
    1.69
    בת
    1.66
    Act Density 0.001%

    No Known Activations