INDEX
    Explanations

    expressions of kindness and generosity

    New Auto-Interp
    Negative Logits
     createState
    -0.73
    -0.73
    endpush
    -0.65
     verhe
    -0.58
     nere
    -0.56
     superiori
    -0.56
     réve
    -0.56
     bodem
    -0.56
     Temptation
    -0.55
     modernization
    -0.54
    POSITIVE LOGITS
     kindness
    1.31
    kindness
    1.14
     compassionate
    1.09
     Kindness
    1.07
     compassion
    1.05
    Compassion
    1.02
     generosity
    0.95
     warm
    0.95
     charitable
    0.92
     altru
    0.92
    Act Density 0.234%

    No Known Activations