INDEX
    Explanations

    temporal references related to past events

    New Auto-Interp
    Negative Logits
    _BEFORE
    -0.19
     before
    -0.18
    _before
    -0.17
     ÙĤبÙĦ
    -0.16
    before
    -0.16
     they
    -0.16
     antes
    -0.16
     Before
    -0.16
    çĦ¶
    -0.15
    Before
    -0.15
    POSITIVE LOGITS
    words
    0.24
    ward
    0.21
    wards
    0.20
    hand
    0.19
    word
    0.19
     wards
    0.18
     ниÑħ
    0.18
    //{{
    0.17
    neath
    0.17
     them
    0.16
    Act Density 0.083%

    No Known Activations