INDEX
Explanations
disciplinary
This neuron detects mentions of internal disciplinary processes or misconduct (e.g., words like “disciplinary,” “misconduct,” “internal actions”).
New Auto-Interp
Negative Logits
went
-0.06
Photon
-0.06
nové
-0.06
Systems
-0.06
Your
-0.06
ownik
-0.06
creo
-0.06
peut
-0.06
NewLabel
-0.06
Rox
-0.06
POSITIVE LOGITS
misconduct
0.07
语
0.07
druž
0.07
ModelState
0.07
业
0.06
Мих
0.06
escol
0.06
Inquiry
0.06
виконав
0.06
ы
0.06
Activations Density 0.003%