INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    elper
    -0.19
    emme
    -0.15
    --
    -0.14
    --;
    -0.14
    )--
    -0.14
     (`
    -0.14
    -&
    -0.14
    --[
    -0.14
    ostel
    -0.14
     '[
    -0.14
    POSITIVE LOGITS
     fucking
    0.23
     fuck
    0.17
     Fucking
    0.17
    Fuck
    0.17
     FUCK
    0.17
     bullshit
    0.17
     Fuck
    0.16
     shit
    0.16
     fucked
    0.16
     Wonder
    0.16
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.