INDEX
    Explanations

    instances of altruistic actions or good deeds

    New Auto-Interp
    Negative Logits
    redients
    -0.78
     rough
    -0.64
     preliminary
    -0.63
     basics
    -0.61
    ellow
    -0.57
     Rough
    -0.56
    ications
    -0.55
     Extras
    -0.55
     Upton
    -0.55
    ils
    -0.53
    POSITIVE LOGITS
    never
    2.09
     never
    2.06
     NEVER
    1.96
     Never
    1.87
    Never
    1.79
     ALWAYS
    1.69
     always
    1.68
    always
    1.65
     ever
    1.58
     Always
    1.50
    Act Density 0.208%

    No Known Activations