INDEX
Explanations
The neuron activates on normative requirement language—phrases stating what “must” or “should” be done or “at least” needs to be present.
New Auto-Interp
Negative Logits
Best
-0.07
employees
-0.07
आई
-0.07
User
-0.07
внимание
-0.06
-plane
-0.06
owning
-0.06
Hide
-0.06
Employees
-0.06
Pi
-0.06
POSITIVE LOGITS
чи
0.07
ING
0.07
disparate
0.07
инг
0.07
extravag
0.07
صند
0.06
rico
0.06
cer
0.06
territorial
0.06
제
0.06
Activations Density 0.016%