INDEX
    Explanations

    This neuron activates on mentions of adults (i.e. the token “adult”/“adults”).

    New Auto-Interp
    Negative Logits
     아이
    -0.07
    mites
    -0.07
     Stop
    -0.06
    addafi
    -0.06
    IDS
    -0.06
     tup
    -0.06
     porter
    -0.06
     Casa
    -0.06
    arter
    -0.06
    Wis
    -0.06
    POSITIVE LOGITS
     disclosing
    0.07
    0.07
     проп
    0.07
     همیشه
    0.06
    .script
    0.06
    setq
    0.06
    のが
    0.06
    0.06
     theme
    0.06
    ��제
    0.06
    Act Density 0.012%

    No Known Activations