INDEX
    Explanations

    mentions of specific actions or descriptors that evoke a strong response

    New Auto-Interp
    Negative Logits
     myſelf
    -0.98
     itſelf
    -0.85
     Majefty
    -0.82
     Monfieur
    -0.81
     ſeveral
    -0.79
     AttributeSet
    -0.76
    الحياه
    -0.76
     himſelf
    -0.74
     ་་
    -0.74
     fubject
    -0.74
    POSITIVE LOGITS
     me
    0.65
     us
    0.59
     our
    0.45
    .
    0.44
     hanem
    0.43
     Me
    0.40
    /
    0.39
     saites
    0.39
     .
    0.37
     di
    0.36
    Act Density 0.385%

    No Known Activations