INDEX
    Explanations

    AI-generated content

    The neuron activates on phrases describing a language model’s ability to generate human‐like or indistinguishable text.

    New Auto-Interp
    Negative Logits
     ADVISED
    -0.06
     mužů
    -0.06
     Wes
    -0.06
     Tus
    -0.06
     modules
    -0.06
     false
    -0.06
    ted
    -0.06
     mapa
    -0.06
    /Button
    -0.06
     dày
    -0.06
    POSITIVE LOGITS
     확실
    0.07
     giveaways
    0.07
     undeniable
    0.06
    े,
    0.06
    ']).
    0.06
    OCUMENT
    0.06
    .SERVER
    0.06
    0.06
    /admin
    0.06
     strtok
    0.06
    Act Density 0.032%

    No Known Activations