INDEX
    Explanations

    references to censorship and banned media

    New Auto-Interp
    Negative Logits
     exact
    -0.17
    exact
    -0.15
    orr
    -0.14
    éŁ¿
    -0.14
    urse
    -0.14
    باÙĨ
    -0.14
    uales
    -0.14
     curse
    -0.14
     chests
    -0.14
    yd
    -0.13
    POSITIVE LOGITS
    ë§ŀ
    0.16
    abic
    0.16
    onus
    0.15
    avery
    0.15
    ĨĴ
    0.15
    ома
    0.15
    beeld
    0.15
    -LAST
    0.15
    iosper
    0.15
    extField
    0.14
    Act Density 0.245%

    No Known Activations