INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     uname
    -0.09
     patriotic
    -0.09
     Basin
    -0.08
    _VOL
    -0.08
     sest
    -0.08
     präsentiert
    -0.08
     Montenegro
    -0.08
    prises
    -0.08
     volunteered
    -0.08
     brochures
    -0.08
    POSITIVE LOGITS
     enforcement
    0.12
     enforcing
    0.12
     enforced
    0.11
     forbid
    0.11
     enforce
    0.10
     Enforcement
    0.10
     rules
    0.10
    禁止
    0.10
     prohib
    0.10
     prohibit
    0.09
    Act Density 0.001%

    No Known Activations