INDEX
    Explanations

    expressions of shame or feelings of guilt

    New Auto-Interp
    Negative Logits
     Yorker
    -0.45
    Dix
    -0.44
     UCB
    -0.42
    relationship
    -0.40
     във
    -0.40
     Dodo
    -0.40
     Dory
    -0.39
     Yorkers
    -0.39
    politico
    -0.39
     ftu
    -0.38
    POSITIVE LOGITS
     Shame
    1.19
    Shame
    1.09
    shame
    1.05
     shame
    1.00
     honte
    0.82
     shameful
    0.81
     shaming
    0.79
     ashamed
    0.73
    hame
    0.65
     autorytatywna
    0.60
    Act Density 0.006%

    No Known Activations