INDEX
    Explanations

    possessive + action or quality

    New Auto-Interp
    Negative Logits
    æĪĴ
    -0.09
    akk
    -0.09
    afil
    -0.08
     Smy
    -0.08
     goodwill
    -0.08
     blame
    -0.08
     inher
    -0.08
     Fir
    -0.08
     sincere
    -0.08
    svm
    -0.08
    POSITIVE LOGITS
     efforts
    0.24
     contribution
    0.22
     role
    0.20
     contributions
    0.19
     sake
    0.19
     effort
    0.17
     actions
    0.17
     part
    0.16
     help
    0.15
     work
    0.15
    Act Density 0.033%

    No Known Activations