INDEX
Explanations
believes
This neuron detects hedging or epistemic-modal language—words that signal opinion, belief, expectation, or uncertainty.
New Auto-Interp
Negative Logits
already
-0.07
ся
-0.07
�
-0.07
Less
-0.06
Once
-0.06
altına
-0.06
Also
-0.06
Meanwhile
-0.06
Meanwhile
-0.06
not
-0.06
POSITIVE LOGITS
إلا
0.07
.sc
0.07
_SLEEP
0.06
GEN
0.06
as
0.06
TEMP
0.06
[T
0.06
-gen
0.06
Alejandro
0.06
Tan
0.06
Activations Density 0.068%