INDEX
    Explanations

    references to temporal concepts and the likelihood of events or conditions occurring

    New Auto-Interp
    Negative Logits
    him
    -0.23
     нÑĮого
    -0.21
    them
    -0.21
     него
    -0.20
     THEM
    -0.19
     eux
    -0.19
     lui
    -0.18
     ниÑħ
    -0.18
     him
    -0.18
     немÑĥ
    -0.18
    POSITIVE LOGITS
     they
    0.47
     we
    0.38
     there
    0.35
     someone
    0.34
     it
    0.34
     that
    0.31
     things
    0.31
     somebody
    0.29
     she
    0.29
     something
    0.29
    Act Density 0.200%

    No Known Activations