INDEX
    Explanations

    unstructured data

    This neuron responds to tokens naming or indicating content‐safety categories (e.g. “sexual,” “violence,” “self‐harm,” “narcotics,” etc.).

    New Auto-Interp
    Negative Logits
    -0.06
    _users
    -0.06
     FOR
    -0.06
    -0.06
    'D
    -0.06
     kutje
    -0.06
     Bilg
    -0.06
     Ign
    -0.06
    ACCOUNT
    -0.05
    회사
    -0.05
    POSITIVE LOGITS
     Mezi
    0.07
     jednoho
    0.06
    inely
    0.06
    she
    0.06
    .total
    0.06
    angi
    0.06
    _destroy
    0.06
    _EXTENDED
    0.06
     ruce
    0.06
    rious
    0.06
    Act Density 0.010%

    No Known Activations