INDEX
    Explanations

    instances of the word "align" and its variations, indicating a focus on concepts of alignment and agreement

    New Auto-Interp
    Negative Logits
    elyn
    -0.19
    خاÙĨÙĩ
    -0.17
    zk
    -0.17
     lại
    -0.17
    à
    -0.16
    ánh
    -0.16
    els
    -0.15
    stown
    -0.14
    stral
    -0.14
     Dữ
    -0.14
    POSITIVE LOGITS
    arity
    0.21
    amenti
    0.20
    ments
    0.20
     perfectly
    0.18
    ird
    0.16
    MENT
    0.16
    edly
    0.16
    ingly
    0.16
    EMENT
    0.16
    imenti
    0.15
    Act Density 0.019%

    No Known Activations