INDEX
    Explanations

    The main thing this neuron does is detect the filler/comparative word “like” in text.

    New Auto-Interp
    Negative Logits
    ulnerable
    -0.07
     ضو
    -0.06
    zym
    -0.06
    .activity
    -0.06
     koruy
    -0.06
     sebuah
    -0.06
     sclerosis
    -0.06
    	info
    -0.06
     festivals
    -0.06
    -0.06
    POSITIVE LOGITS
     Like
    0.08
     Shard
    0.08
    .console
    0.07
     لي
    0.07
    。”↵↵
    0.06
    
    0.06
    Like
    0.06
     ارد
    0.06
     habit
    0.06
    请选择
    0.06
    Act Density 0.015%

    No Known Activations