INDEX
    Explanations

    pronoun then preposition

    New Auto-Interp
    Negative Logits
    os
    0.30
    itin
    0.29
    img
    0.29
    -
    0.29
    um
    0.28
    od
    0.28
    iation
    0.27
    water
    0.27
     =
    0.26
    y
    0.26
    POSITIVE LOGITS
     in
    0.34
    0.27
    们的
    0.25
     as
    0.25
     के
    0.25
    ſelf
    0.25
     در
    0.25
     at
    0.23
     into
    0.23
    0.23
    Act Density 0.679%

    No Known Activations