INDEX
    Explanations

    references to time-related phrases or markers

    New Auto-Interp
    Negative Logits
    away
    -0.17
    аÑĢи
    -0.15
    chal
    -0.14
    tl
    -0.14
    sWith
    -0.14
    te
    -0.14
    nga
    -0.14
    ych
    -0.14
    ennes
    -0.14
    yn
    -0.14
    POSITIVE LOGITS
    -than
    0.33
    _than
    0.31
     than
    0.29
    Than
    0.26
     THAN
    0.24
    than
    0.24
    _THAN
    0.22
     Than
    0.21
     niż
    0.18
     вÑģего
    0.18
    Act Density 0.014%

    No Known Activations