INDEX
    Explanations

    the word "supreme" or related terms, as well as phrases related to authority and power

    references to authority and supremacy

    New Auto-Interp
    Negative Logits
    OUT
    -0.90
    ppo
    -0.83
    TPS
    -0.74
    okemon
    -0.72
    uffy
    -0.70
    ugg
    -0.69
    FORE
    -0.69
    ople
    -0.67
    zl
    -0.66
    kson
    -0.65
    POSITIVE LOGITS
    ly
    0.85
    rament
    0.80
    most
    0.76
    essential
    0.72
     secrecy
    0.70
     vigilance
    0.70
    ITY
    0.70
    iour
    0.69
    doms
    0.69
    reme
    0.68
    Act Density 0.013%

    No Known Activations