INDEX
    Explanations

    pronoun followed by auxiliary verb

    New Auto-Interp
    Negative Logits
    0.23
    νος
    0.23
     offsetting
    0.21
     advancing
    0.20
    游戏中
    0.20
     indexing
    0.20
    0.20
    𝗬
    0.20
     แล้ว
    0.20
     مراحل
    0.19
    POSITIVE LOGITS
     can
    0.34
     had
    0.28
     is
    0.28
     are
    0.26
     have
    0.26
    0.26
     was
    0.25
     will
    0.24
     cannot
    0.24
     zijn
    0.23
    Act Density 0.787%

    No Known Activations