INDEX
    Explanations

    adjectives describing qualities of behavior or actions

    assertions about societal issues related to morality and exploitation

    New Auto-Interp
    Negative Logits
    Spec
    -0.86
    ITNESS
    -0.82
    spec
    -0.76
     specs
    -0.73
    redits
    -0.70
    entials
    -0.68
    pletion
    -0.67
    pleted
    -0.66
    uilt
    -0.66
    maps
    -0.66
    POSITIVE LOGITS
     insidious
    1.32
     tactic
    1.22
     hypocrisy
    1.21
     hypocritical
    1.16
     endemic
    1.15
     coward
    1.11
     despicable
    1.10
     rampant
    1.09
     tactics
    1.09
     pervasive
    1.08
    Act Density 0.554%

    No Known Activations