INDEX
    Explanations

    The neuron selectively activates for the preposition “against,” flagging instances of that specific word.

    New Auto-Interp
    Negative Logits
    290
    -0.08
    660
    -0.07
    110
    -0.07
    395
    -0.07
     do
    -0.07
    988
    -0.07
     kode
    -0.07
     lul
    -0.07
     ho
    -0.06
    099
    -0.06
    POSITIVE LOGITS
     against
    0.17
    against
    0.15
     Against
    0.14
    Against
    0.13
     против
    0.08
    UNG
    0.08
    :";
    ↵
    0.08
     приг
    0.07
    apanese
    0.07
    zens
    0.07
    Act Density 0.028%

    No Known Activations