INDEX
    Explanations

    This neuron activates on the definite article “the.”

    New Auto-Interp
    Negative Logits
     lập
    -0.07
    zier
    -0.07
    _UNIT
    -0.07
    queries
    -0.06
    -AA
    -0.06
    ندية
    -0.06
    arna
    -0.06
    .f
    -0.06
     aging
    -0.06
    -0.06
    POSITIVE LOGITS
    wave
    0.06
     unsett
    0.06
     transient
    0.06
    (){
    ↵
    ↵
    0.06
     सरक
    0.06
     insanely
    0.06
    acerb
    0.06
     Prompt
    0.06
    osti
    0.06
     راه
    0.05
    Act Density 0.003%

    No Known Activations