INDEX
    Explanations

    comparative phrases, especially those indicating superiority or preference

    New Auto-Interp
    Negative Logits
    outu
    -0.19
    rou
    -0.15
    axy
    -0.15
    sworth
    -0.15
    irim
    -0.15
    sw
    -0.14
    aly
    -0.14
    ählen
    -0.14
    Forge
    -0.14
     gó
    -0.14
    POSITIVE LOGITS
    ige
    0.17
     ever
    0.16
    á»ķ
    0.16
    olet
    0.15
     dozen
    0.15
    oler
    0.14
    urret
    0.14
     usual
    0.14
    FD
    0.14
    ovies
    0.14
    Act Density 0.057%

    No Known Activations