INDEX
    Explanations

    phrases indicating comparison or difficulty in achieving tasks

    New Auto-Interp
    Head Attr Weights
    0:0.11
    1:0.03
    2:0.01
    3:0.16
    4:0.13
    5:0.04
    6:0.06
    7:0.05
    8:0.23
    9:0.05
    10:0.03
    11:0.05
    Negative Logits
    weeney
    -2.29
     actionGroup
    -2.22
    eson
    -2.12
    ureen
    -2.10
    bol
    -2.09
    entin
    -2.07
    letters
    -2.04
    nesday
    -2.03
    milo
    -1.95
    Dispatch
    -1.87
    POSITIVE LOGITS
     unheard
    2.01
     forgiven
    1.92
    thanks
    1.89
     appreciated
    1.87
    .",
    1.81
     profitable
    1.81
    !".
    1.81
     certs
    1.77
    !",
    1.73
     true
    1.72
    Act Density 0.001%

    No Known Activations