INDEX
Explanations
The neuron activates on apology or regret phrases (e.g. “I apologize,” “sorry,” “not helpful,” etc.).
New Auto-Interp
Negative Logits
σε
-0.06
-vars
-0.06
_walk
-0.06
finds
-0.06
-pres
-0.06
amodel
-0.06
Pharmacy
-0.06
)section
-0.06
ольно
-0.06
sends
-0.05
POSITIVE LOGITS
ститут
0.07
OrUpdate
0.06
привы
0.06
iento
0.06
hữu
0.06
863
0.06
ुओ
0.06
تقو
0.06
230
0.06
[$
0.06
Activations Density 0.004%