INDEX
    Explanations

    The neuron is detecting the appearance of the word “euphemisms,” i.e. euphemistic language.

    New Auto-Interp
    Negative Logits
     PB
    -0.07
     ціл
    -0.07
     ))↵
    -0.06
     문의
    -0.06
     thước
    -0.06
    .Email
    -0.06
    igits
    -0.06
     tbl
    -0.06
    -match
    -0.06
    .proto
    -0.06
    POSITIVE LOGITS
     workplace
    0.07
     premium
    0.07
     trading
    0.07
     sanction
    0.07
     transformers
    0.07
     ngủ
    0.06
     Ernest
    0.06
    υκ
    0.06
     preparations
    0.06
     appropriations
    0.06
    Act Density 0.001%

    No Known Activations