INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ak
    -0.07
    ルド
    -0.07
     bookstore
    -0.07
    niž
    -0.07
    	Write
    -0.06
     cuckold
    -0.06
     ----------------------------------------------------------------------------------------------------------------
    -0.06
    ZW
    -0.06
     Separ
    -0.06
    braco
    -0.06
    POSITIVE LOGITS
     anti
    0.07
    τερα
    0.06
    .pictureBox
    0.06
     anything
    0.06
    _except
    0.06
     antibodies
    0.06
    Anywhere
    0.06
    Smart
    0.06
    0.06
    'I
    0.06
    Act Density 0.006%

    No Known Activations