INDEX
    Explanations

    The neuron fires most strongly on technical, topic‐introducing vocabulary—especially when new concepts or specialized terms (e.g. “privacy,” “ontology,” “quantification,” “contribution”) are first presented.

    New Auto-Interp
    Negative Logits
    (Layout
    -0.07
    ğu
    -0.07
     Roses
    -0.06
     раді
    -0.06
    lod
    -0.06
    riage
    -0.06
    Kelly
    -0.06
     privilege
    -0.06
    wipe
    -0.06
    انية
    -0.06
    POSITIVE LOGITS
     المر
    0.07
     invoke
    0.07
    zure
    0.06
     یکی
    0.06
    0.06
    0.06
     regulate
    0.06
    0.06
     ItemStack
    0.06
    Means
    0.06
    Act Density 0.028%

    No Known Activations