INDEX
Explanations
dialogue snippets
The neuron detects polite address or apology phrases (e.g. “Madam,” “I’m sorry,” etc.).
New Auto-Interp
Negative Logits
-0.07
pow
-0.07
.inverse
-0.07
find
-0.07
ний
-0.06
iera
-0.06
jew
-0.06
ween
-0.06
например
-0.06
culated
-0.06
POSITIVE LOGITS
(/^\
0.07
_FLUSH
0.07
вступ
0.06
BASH
0.06
testim
0.06
едак
0.06
capture
0.06
.DELETE
0.06
uncont
0.06
WaitForSeconds
0.06
Activations Density 0.011%