INDEX
    Explanations

    strong and impactful words

    terms associated with manipulation, control, and centralization of power or policies

    New Auto-Interp
    Negative Logits
    OTOS
    -0.67
     Annotations
    -0.58
    bsite
    -0.57
     scanned
    -0.55
     guy
    -0.53
    cyclopedia
    -0.52
     nudity
    -0.51
    HOME
    -0.50
     Bucks
    -0.50
     photograp
    -0.50
    POSITIVE LOGITS
    ibly
    0.74
    polit
    0.73
    otent
    0.72
    emonic
    0.71
    itatively
    0.70
    arious
    0.70
    iously
    0.70
    iable
    0.70
    ulative
    0.68
    iably
    0.66
    Act Density 0.489%

    No Known Activations