INDEX
Explanations
This neuron fires on self-referential promotional language—mentions of the program or organization’s own name, “we/our,” and direct calls to action like “contact us” or “let us know.”
New Auto-Interp
Negative Logits
_SET
-0.07
Generator
-0.07
PARSE
-0.07
ppard
-0.06
epsilon
-0.06
istani
-0.06
LIKELY
-0.06
IsRequired
-0.06
Shared
-0.06
Sinh
-0.06
POSITIVE LOGITS
{↵↵↵↵0.08
збір
0.07
{↵↵↵0.07
=>{↵0.07
; ↵ ↵ ↵
0.06
); ↵ ↵ ↵
0.06
// ↵ ↵
0.06
(coll
0.06
toHave
0.06
"}}>↵
0.06
Activations Density 0.061%