INDEX
    Explanations

    positive sentiments associated with the concept of "good."

    New Auto-Interp
    Negative Logits
    ean
    -0.17
    attern
    -0.16
    /Dk
    -0.15
    laz
    -0.14
    chers
    -0.14
    ĵåIJį
    -0.14
    ynchronously
    -0.14
    ltk
    -0.14
    atoire
    -0.14
     Equality
    -0.14
    POSITIVE LOGITS
     intentions
    0.27
     deeds
    0.24
     intention
    0.24
     fortune
    0.24
    intent
    0.23
     Intent
    0.23
     Samar
    0.22
     works
    0.21
     citizenship
    0.21
    reads
    0.20
    Act Density 0.055%

    No Known Activations