INDEX
    Explanations

    The neuron activates on apologies, specifically when the model says “sorry.”

    New Auto-Interp
    Negative Logits
    Fcn
    -0.07
    _auc
    -0.06
    .Of
    -0.06
    енню
    -0.06
     DWORD
    -0.06
     ovšem
    -0.06
     enqueue
    -0.06
    ambil
    -0.06
     mijn
    -0.06
     hugs
    -0.06
    POSITIVE LOGITS
     Yaş
    0.07
     الجن
    0.06
     veil
    0.06
    (category
    0.06
     شبکه
    0.06
    nnen
    0.06
    CGRect
    0.06
    tics
    0.06
    Accuracy
    0.06
    _SOUND
    0.06
    Act Density 0.011%

    No Known Activations