INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     unlawful
    -0.10
     Ernst
    -0.10
    inati
    -0.10
     Pra
    -0.10
     ReadOnly
    -0.09
    ilio
    -0.09
    mani
    -0.09
    ussy
    -0.09
    igon
    -0.09
     wh
    -0.08
    POSITIVE LOGITS
     allowed
    0.19
    allowed
    0.17
    Allowed
    0.14
     permitted
    0.14
     Allowed
    0.13
     before
    0.13
     maximum
    0.12
     tolerated
    0.11
     Maximum
    0.11
    åħģ
    0.11
    Act Density 0.112%

    No Known Activations