INDEX
    Explanations

    The neuron specifically fires on the standalone word “Our,” especially when it appears as the first token of a segment.

    New Auto-Interp
    Negative Logits
    tgl
    -0.07
    exampleInputEmail
    -0.06
     locale
    -0.06
     двор
    -0.06
    .raise
    -0.06
    -0.06
    usercontent
    -0.06
    -0.06
    .argmax
    -0.06
    .Parcelable
    -0.06
    POSITIVE LOGITS
     Our
    0.08
    Our
    0.07
     inconsistent
    0.06
    µ
    0.06
     mother
    0.06
    	names
    0.06
    My
    0.06
    0.06
     sponsor
    0.06
    아서
    0.06
    Act Density 0.007%

    No Known Activations