INDEX
Explanations
The neuron activates on apologies, specifically when the model says “sorry.”
New Auto-Interp
Negative Logits
Fcn
-0.07
_auc
-0.06
.Of
-0.06
енню
-0.06
DWORD
-0.06
ovšem
-0.06
enqueue
-0.06
ambil
-0.06
mijn
-0.06
hugs
-0.06
POSITIVE LOGITS
Yaş
0.07
الجن
0.06
veil
0.06
(category
0.06
شبکه
0.06
nnen
0.06
CGRect
0.06
tics
0.06
Accuracy
0.06
_SOUND
0.06
Activations Density 0.011%