INDEX
    Explanations

    The neuron strongly activates on occurrences of the token “NATO” (and related references to the alliance).

    New Auto-Interp
    Negative Logits
    ());
    ↵
    ↵
    -0.07
    ')[
    -0.07
    DataRow
    -0.06
     wła
    -0.06
    iquer
    -0.06
     arson
    -0.06
    /values
    -0.06
    ]):
    ↵
    -0.06
    esses
    -0.06
     قم
    -0.06
    POSITIVE LOGITS
     NATO
    0.12
    placeholder
    0.07
     navigationController
    0.07
    antt
    0.07
     Katie
    0.07
    .navigationController
    0.06
     Bravo
    0.06
     Conservative
    0.06
     Caesar
    0.06
    *pi
    0.06
    Act Density 0.001%

    No Known Activations