INDEX
    Explanations

    The neuron fires on nonzero activations for words that signal formal rulings or approvals (e.g. “approved,” “accepted,” “rejected,” “decided”).

    New Auto-Interp
    Negative Logits
    guna
    -0.06
    _scale
    -0.06
     Erdoğan
    -0.06
     cánh
    -0.06
    creds
    -0.06
     pancakes
    -0.06
     Drinks
    -0.06
     Jiří
    -0.06
     худож
    -0.06
     rodin
    -0.06
    POSITIVE LOGITS
     convention
    0.07
     GENERAL
    0.07
    》(
    0.06
     داخلی
    0.06
     department
    0.06
     labeled
    0.06
    .StylePriority
    0.06
     professional
    0.06
     usable
    0.06
    <IEnumerable
    0.06
    Act Density 0.025%

    No Known Activations