INDEX
    Explanations

    verbs followed by objects

    New Auto-Interp
    Negative Logits
    0.54
    0.54
     ތ
    0.53
     бушлай
    0.53
    0.53
    0.52
    0.52
     ສຳ
    0.51
     ބ
    0.51
     ພວກເຮົາ
    0.50
    POSITIVE LOGITS
    0.86
    ar
    0.81
    o
    0.70
    و
    0.52
    u
    0.52
    an
    0.49
    et
    0.48
    ம்
    0.48
    ו
    0.48
    ↵↵
    0.47
    Act Density 3.455%

    No Known Activations