INDEX
    Explanations

    words related to favor or preference

    New Auto-Interp
    Negative Logits
    es
    -0.81
    Де
    -0.69
    ات
    -0.69
    thique
    -0.68
     dalamnya
    -0.67
    -0.65
    ی
    -0.65
     Sald
    -0.63
     Де
    -0.62
     ng
    -0.61
    POSITIVE LOGITS
     Favor
    1.34
    Favor
    1.20
     Fav
    1.19
    favor
    1.10
     favors
    1.10
     favoring
    1.09
     שוליים
    1.09
     favour
    1.05
     favours
    1.05
    Fav
    1.01
    Act Density 0.013%

    No Known Activations