INDEX
    Explanations

    words related to specific professions or specific scenarios involving those professions

    elements associated with authority figures and societal structures

    New Auto-Interp
    Negative Logits
    urations
    -0.74
     Dover
    -0.69
    sequ
    -0.67
    KC
    -0.64
     respectively
    -0.59
     Codex
    -0.57
    ophon
    -0.55
     Ply
    -0.55
     Simpl
    -0.54
    ollow
    -0.54
    POSITIVE LOGITS
     knows
    0.92
     dies
    0.89
     thinks
    0.88
     who
    0.88
    who
    0.85
     decides
    0.84
     masturb
    0.84
     whom
    0.83
     wears
    0.81
     wants
    0.80
    Act Density 0.766%

    No Known Activations