INDEX
    Explanations

    Repetition or duplication

    The neuron responds to occurrences of the word “same.”

    New Auto-Interp
    Negative Logits
     supremacy
    -0.07
     ranges
    -0.07
     ultimate
    -0.07
    なる
    -0.07
     влад
    -0.07
     snaps
    -0.06
     haha
    -0.06
     kh
    -0.06
     سنة
    -0.06
    Damn
    -0.06
    POSITIVE LOGITS
    0.07
    _DEL
    0.07
    0.06
    listener
    0.06
    isz
    0.06
    HttpServletRequest
    0.06
    aac
    0.06
    /dd
    0.06
    γχ
    0.06
    otty
    0.06
    Act Density 0.006%

    No Known Activations