INDEX
    Explanations

    informational content related to historical or cultural documents.

    This neuron strongly activates on anonymized placeholder tokens (like “NAME_1”, “NAME_2”, etc.), i.e. redacted name‐entity markers.

    New Auto-Interp
    Negative Logits
    axes
    -0.07
    avi
    -0.06
     Waves
    -0.06
     Flowers
    -0.06
    <Player
    -0.06
     escal
    -0.06
    ]].
    -0.06
    Ok
    -0.06
     Jobs
    -0.06
    flowers
    -0.06
    POSITIVE LOGITS
    TagName
    0.07
     consoles
    0.06
    .Root
    0.06
    .uint
    0.06
    četně
    0.06
     बच
    0.06
     dapat
    0.06
    .Metro
    0.06
    'util
    0.06
     unmanned
    0.06
    Act Density 0.012%

    No Known Activations