INDEX
    Explanations

    The neuron selectively activates on fragments of personal names—especially uncommon or foreign‐sounding surnames.

    New Auto-Interp
    Negative Logits
    """
    ↵
    -0.07
     verw
    -0.06
     mus
    -0.06
    (part
    -0.06
    *z
    -0.06
     obra
    -0.05
     उम
    -0.05
    hart
    -0.05
    Sus
    -0.05
     δεδο
    -0.05
    POSITIVE LOGITS
    ropping
    0.08
     Οικο
    0.07
     Κο
    0.07
    öh
    0.07
    connected
    0.07
    0.06
    .met
    0.06
    Driver
    0.06
    отв
    0.06
    trinsic
    0.06
    Act Density 0.076%

    No Known Activations