INDEX
    Explanations

    references to authority figures, specifically in socio-political contexts

    New Auto-Interp
    Negative Logits
     Z
    -0.17
     Arts
    -0.15
    30
    -0.15
    381
    -0.15
     pies
    -0.14
    -n
    -0.14
     colon
    -0.14
    ertest
    -0.14
    -Z
    -0.14
    erea
    -0.14
    POSITIVE LOGITS
    lems
    0.17
    oS
    0.15
    ec
    0.14
    ùy
    0.14
    .utf
    0.14
    ека
    0.14
    ведиÑĤе
    0.14
     Phoenix
    0.14
     Rak
    0.13
    SI
    0.13
    Act Density 0.038%

    No Known Activations