INDEX
    Explanations

    political and authoritarian-themed terms and actions

    New Auto-Interp
    Negative Logits
    éĹĺ
    -0.73
     kHz
    -0.69
    hower
    -0.67
    terday
    -0.66
    ignty
    -0.66
     apprehension
    -0.62
     cob
    -0.61
    kHz
    -0.60
     Ae
    -0.60
     Siem
    -0.59
    POSITIVE LOGITS
    gers
    1.39
    glers
    1.31
    rett
    1.21
    mented
    1.13
    rant
    1.12
    ging
    1.11
    gy
    1.08
    mentation
    1.08
    hett
    1.05
    herer
    1.04
    Act Density 4.489%

    No Known Activations