INDEX
    Explanations

    instances of punctuation or symbols

    New Auto-Interp
    Negative Logits
     eux
    -0.18
     lui
    -0.17
    them
    -0.17
     THEM
    -0.16
     нÑĮого
    -0.15
     ниÑħ
    -0.15
     him
    -0.14
     Otherwise
    -0.14
     него
    -0.13
    ãģ¨ãĤĤ
    -0.13
    POSITIVE LOGITS
     there
    0.52
     it
    0.49
    there
    0.35
     we
    0.35
     they
    0.30
     you
    0.30
     many
    0.28
     nothing
    0.28
    it
    0.26
     this
    0.26
    Act Density 0.526%

    No Known Activations