INDEX
    Explanations

    punctuation

    The neuron fires on tokens used to praise someone’s character or professional virtues—e.g. words like “responsible,” “reliable,” “integrity,” “compassion,” and similar descriptors of trustworthiness and professionalism.

    New Auto-Interp
    Negative Logits
    otine
    -0.07
     grandchildren
    -0.06
    optimizer
    -0.06
    -0.06
    оч
    -0.06
     hen
    -0.06
    shouldReceive
    -0.06
    -0.06
    <article
    -0.06
    isOpen
    -0.06
    POSITIVE LOGITS
    boa
    0.07
    .General
    0.07
     Only
    0.07
     الى
    0.06
    /ca
    0.06
     cerr
    0.06
     hodiny
    0.06
     conveying
    0.06
    หาก
    0.06
     dims
    0.06
    Act Density 0.089%

    No Known Activations