INDEX
    Explanations

    references to surveillance and control by authority figures

    New Auto-Interp
    Negative Logits
    IDEOS
    -0.16
    onaut
    -0.16
    ona
    -0.15
    ÑĢой
    -0.15
    pedo
    -0.14
    adj
    -0.14
    canonical
    -0.14
     LGBTQ
    -0.14
    iffer
    -0.14
    RI
    -0.14
    POSITIVE LOGITS
    AZY
    0.17
    symbol
    0.17
     Symbol
    0.17
     symbol
    0.16
    (symbol
    0.15
     blinded
    0.15
    -symbol
    0.15
    ymbol
    0.15
     sublic
    0.14
    .Symbol
    0.14
    Act Density 0.021%

    No Known Activations