INDEX
Explanations
The neuron detects language signaling long‐standing states or enduring problems—especially phrases like “long been” or “suffered from.”
New Auto-Interp
Negative Logits
unstoppable
-0.07
функци
-0.07
Pil
-0.06
maximum
-0.06
appraisal
-0.06
Broadway
-0.06
případ
-0.06
тка
-0.06
/dd
-0.06
exclusive
-0.06
POSITIVE LOGITS
longtime
0.09
longstanding
0.08
long
0.08
:self
0.07
hovered
0.07
msg
0.07
Smoke
0.07
_STOP
0.07
коп
0.07
身
0.07
Activations Density 0.020%