INDEX
    Explanations

    references to explicit and adult-themed content

    New Auto-Interp
    Negative Logits
    anse
    -0.16
    aukee
    -0.15
    ivre
    -0.15
    ÑĪин
    -0.15
    .dist
    -0.14
    dge
    -0.14
     alta
    -0.14
    masked
    -0.14
     Kahn
    -0.14
    atomic
    -0.14
    POSITIVE LOGITS
    Ĥ
    0.15
    -hole
    0.15
    -boy
    0.15
    /rem
    0.14
    CRET
    0.14
     naughty
    0.14
    chten
    0.14
    ยà¸ĩ
    0.14
     coup
    0.14
    cream
    0.13
    Act Density 0.019%

    No Known Activations