INDEX
    Explanations

    pronouns and their actions

    New Auto-Interp
    Negative Logits
    0.17
    ↵↵
    0.15
    يا
    0.14
     বলল
    0.14
    पा
    0.14
    0.14
    ал
    0.14
    िस
    0.13
    قي
    0.13
     เชื่อ
    0.13
    POSITIVE LOGITS
     essentially
    0.17
     certainly
    0.17
     typically
    0.17
     sort
    0.15
     operate
    0.15
     often
    0.14
    pherds
    0.14
     themselves
    0.14
    eding
    0.14
     generally
    0.14
    Act Density 0.099%

    No Known Activations