INDEX
    Explanations

    This neuron activates on occurrences of the nationality label “American” in the article’s category listings.

    New Auto-Interp
    Negative Logits
     Abe
    -0.07
     PARTIC
    -0.07
     valid
    -0.06
     elucid
    -0.06
     erf
    -0.06
     tan
    -0.06
    With
    -0.06
     Trump
    -0.06
     Patri
    -0.06
     Lal
    -0.06
    POSITIVE LOGITS
    tık
    0.07
     именно
    0.07
     знач
    0.07
    行動
    0.06
     Александ
    0.06
    0.06
    kiego
    0.06
    assertSame
    0.06
    子は
    0.06
     delight
    0.06
    Act Density 0.004%

    No Known Activations