INDEX
Explanations
the neuron detects factual/reporting content—tokens that signal concrete facts, dates, numbers, or other informationally salient words in news-like passages.
New Auto-Interp
Negative Logits
directive
-0.07
Knot
-0.07
Yes
-0.07
.rm
-0.07
Fransız
-0.06
_cs
-0.06
qp
-0.06
衛
-0.06
Twe
-0.06
ErrorCode
-0.06
POSITIVE LOGITS
�
0.06
BOOST
0.06
__),
0.06
taking
0.06
.Services
0.06
request
0.06
0.06
-pass
0.06
Taking
0.06
พร
0.06
Activations Density 0.077%