INDEX
    Explanations

    words related to politics and leadership

    expressions of hope and admiration

    New Auto-Interp
    Negative Logits
    etc
    -0.75
     nude
    -0.68
     nudity
    -0.65
     Weird
    -0.64
     Originally
    -0.64
    entary
    -0.61
    oteric
    -0.60
    aceae
    -0.60
    ORPG
    -0.60
     vaguely
    -0.60
    POSITIVE LOGITS
    leaders
    0.89
     coward
    0.89
     prag
    0.87
     courage
    0.87
     humility
    0.80
     embold
    0.79
    Failure
    0.78
     betrayal
    0.78
     cynicism
    0.77
     trust
    0.74
    Act Density 0.768%

    No Known Activations