INDEX
    Explanations

    low rank or status

    The neuron activates on the word “commoner,” i.e. references to lower-class/common-status individuals.

    New Auto-Interp
    Negative Logits
     unchecked
    -0.07
     walnut
    -0.07
     debounce
    -0.07
    erosis
    -0.07
    Berlin
    -0.07
    -0.07
    -bedroom
    -0.07
    oplevel
    -0.06
    Pago
    -0.06
    सन
    -0.06
    POSITIVE LOGITS
     getattr
    0.06
     underestimated
    0.06
     appropriate
    0.06
    /apps
    0.06
     сразу
    0.06
    configured
    0.06
    pic
    0.06
    食べ
    0.06
     assass
    0.06
     renewed
    0.06
    Act Density 0.065%

    No Known Activations