INDEX
    Explanations

    comparisons indicating preference or superiority

    comparative phrases indicating "more than" relationships

    New Auto-Interp
    Negative Logits
     Juda
    -0.73
    Ire
    -0.70
    ModLoader
    -0.68
    ilic
    -0.65
    Contract
    -0.63
    Winged
    -0.62
     stead
    -0.62
     derog
    -0.61
    aird
    -0.59
     veter
    -0.58
    POSITIVE LOGITS
    atos
    1.14
    lihood
    0.86
    pload
    0.81
    xual
    0.80
    ply
    0.78
    tz
    0.77
    assis
    0.75
    gs
    0.74
    lio
    0.74
    gins
    0.69
    Act Density 0.026%

    No Known Activations