INDEX
    Explanations

    locations and spatial relationships in the text

    New Auto-Interp
    Negative Logits
     именно
    -0.15
    *****/↵
    -0.15
    ä¸ĬäºĨ
    -0.14
    uality
    -0.14
    еÑħ
    -0.14
    plib
    -0.14
    íĥķ
    -0.14
    boro
    -0.14
    anch
    -0.13
     dÄ±ÅŁÄ±
    -0.13
    POSITIVE LOGITS
    neath
    0.21
    /out
    0.21
    wards
    0.21
     them
    0.19
     ниÑħ
    0.19
    words
    0.18
     him
    0.18
     него
    0.18
    /left
    0.18
    ward
    0.17
    Act Density 0.130%

    No Known Activations