INDEX
    Explanations

    positively or negatively charged adjectives or phrases indicating approval or disapproval

    expressions related to the concepts of good and better, as well as phrases indicating a moral evaluation

    New Auto-Interp
    Negative Logits
    ned
    -0.57
    gall
    -0.56
     Glac
    -0.56
     Notting
    -0.55
     Clancy
    -0.53
    pat
    -0.53
    egu
    -0.52
     Guam
    -0.52
     Rolls
    -0.52
     Tornado
    -0.51
    POSITIVE LOGITS
     sake
    1.62
     purposes
    1.35
     reasons
    1.21
    ummies
    1.20
     reason
    0.94
     Reasons
    0.90
    aughs
    0.87
     purpose
    0.87
     ages
    0.84
    instance
    0.80
    Act Density 0.187%

    No Known Activations