INDEX
    Explanations

    comparative adjectives that indicate size or degree of something

    New Auto-Interp
    Negative Logits
    y
    -0.23
    son
    -0.19
    screen
    -0.17
    ert
    -0.17
    eln
    -0.17
    ertype
    -0.17
    sing
    -0.17
    set
    -0.17
    ikel
    -0.17
    sc
    -0.16
    POSITIVE LOGITS
    -than
    0.57
     than
    0.44
    than
    0.43
    _than
    0.42
     THAN
    0.34
    Than
    0.30
     Than
    0.30
     než
    0.26
     вÑģего
    0.25
     niż
    0.24
    Act Density 0.160%

    No Known Activations