INDEX
    Explanations

    phrases that emphasize the concept of helping or serving others

    New Auto-Interp
    Negative Logits
    asse
    -0.16
    WebResponse
    -0.15
    slu
    -0.15
    شد
    -0.14
    via
    -0.14
    pcl
    -0.14
    ĨĴ
    -0.14
    PEC
    -0.14
    orna
    -0.14
    Stuff
    -0.13
    POSITIVE LOGITS
     favors
    0.42
     fav
    0.36
     favor
    0.34
     favour
    0.32
     Favor
    0.31
     justice
    0.29
     harm
    0.26
     Fav
    0.23
    fav
    0.23
    favor
    0.23
    Act Density 0.026%

    No Known Activations