INDEX
    Explanations

    action verbs followed by object

    New Auto-Interp
    Negative Logits
     éta
    0.31
     étaient
    0.29
     is
    0.29
    0.29
     إذا
    0.29
     hvis
    0.29
     pouvaient
    0.28
     আমি
    0.28
     and
    0.28
     της
    0.28
    POSITIVE LOGITS
    u
    0.42
    w
    0.42
    f
    0.40
    t
    0.39
    ت
    0.38
    ל
    0.35
    ar
    0.34
    0.34
    ר
    0.34
    l
    0.34
    Act Density 0.502%

    No Known Activations