INDEX
    Explanations

    The neuron specifically activates on the word “Detection,” most prominently in the phrase “AI Detection.”

    New Auto-Interp
    Negative Logits
    ew
    -0.07
    efa
    -0.06
    wik
    -0.06
     mnist
    -0.06
    جام
    -0.06
     Scots
    -0.06
     św
    -0.06
    .Bold
    -0.06
    word
    -0.06
    ert
    -0.06
    POSITIVE LOGITS
    ніш
    0.07
    	Init
    0.07
     democratic
    0.06
    .imageView
    0.06
    ":"","
    0.06
    €↵
    0.06
    álně
    0.06
    하시
    0.06
     messageType
    0.06
    -Trump
    0.06
    Act Density 0.002%

    No Known Activations