INDEX
    Explanations

    expressions of willingness and openness to help or engage with others

    New Auto-Interp
    Negative Logits
    ercul
    -0.18
    .scalablytyped
    -0.17
    omaly
    -0.16
    atten
    -0.16
    atters
    -0.15
    erson
    -0.15
    cctor
    -0.15
     succesfully
    -0.14
    uart
    -0.14
    ÐľÐŀ
    -0.14
    POSITIVE LOGITS
     sacrifice
    0.23
     accepting
    0.20
     challenge
    0.20
     accept
    0.19
     accepts
    0.19
     sacr
    0.19
     sacrifices
    0.19
    æİ¥åıĹ
    0.18
     Sacr
    0.18
     Challenge
    0.18
    Act Density 0.093%

    No Known Activations