INDEX
    Explanations

    wikipedia articles

    specific geographic locations or infrastructure terms.

    The neuron selectively activates on the initial word of each article—i.e. the document’s title at the very start.

    New Auto-Interp
    Negative Logits
    avadoc
    -0.06
    Array
    -0.06
     abortion
    -0.06
    flush
    -0.06
    .tabs
    -0.06
    kola
    -0.06
    ereotype
    -0.06
    anges
    -0.06
    ifiers
    -0.06
    =-=-
    -0.06
    POSITIVE LOGITS
    ınızı
    0.07
    0.07
    	HAL
    0.07
     Gemini
    0.06
     BOOT
    0.06
    'm
    0.06
    (percent
    0.06
     Vertical
    0.06
     출장
    0.06
     výstav
    0.06
    Act Density 0.010%

    No Known Activations