INDEX
    Explanations

    mentions of beneficence or beneficial actions

    New Auto-Interp
    Negative Logits
    ghi
    -0.16
    hop
    -0.15
    ÏĢη
    -0.15
    otope
    -0.15
    aseline
    -0.15
    ieri
    -0.15
    orc
    -0.14
    locks
    -0.14
    -shadow
    -0.14
    ork
    -0.14
    POSITIVE LOGITS
    volent
    0.33
    vol
    0.25
    iciary
    0.21
    ath
    0.20
    icial
    0.20
    icia
    0.20
    itting
    0.19
    ific
    0.19
    ift
    0.18
    fits
    0.17
    Act Density 0.009%

    No Known Activations