INDEX
    Explanations

    past-tense verbs about actions

    New Auto-Interp
    Negative Logits
    Ι
    0.39
    U
    0.39
     (
    0.38
     ayatan
    0.38
     ôm
    0.38
    重要的
    0.37
    0.37
    мян
    0.36
     Ι
    0.36
    HN
    0.36
    POSITIVE LOGITS
    ت
    0.71
    w
    0.64
    i
    0.63
    r
    0.62
    ed
    0.61
    in
    0.59
    et
    0.58
    k
    0.58
    n
    0.56
    d
    0.53
    Act Density 0.122%

    No Known Activations