INDEX
    Explanations

    phrases related to making demands or complaints

    expressions of humor and demands in discourse

    New Auto-Interp
    Negative Logits
    ingham
    -0.86
    por
    -0.82
    acion
    -0.74
    ability
    -0.73
    lat
    -0.72
    abet
    -0.72
    amen
    -0.71
    backs
    -0.70
    erning
    -0.69
    able
    -0.68
    POSITIVE LOGITS
    ĸļ
    0.99
    nesday
    0.92
    uled
    0.83
     aloud
    0.79
     Parenthood
    0.75
     teased
    0.73
     showc
    0.73
     repeatedly
    0.73
     sarcast
    0.71
    EStream
    0.69
    Act Density 0.077%

    No Known Activations