INDEX
Explanations
This neuron detects hedging or probability expressions—words like “likely,” “expected,” or “probably” that signal uncertainty or anticipation.
New Auto-Interp
Negative Logits
’ve
-0.07
Momentum
-0.06
dhcp
-0.06
/i
-0.06
Mock
-0.06
Rub
-0.06
OW
-0.06
rectangle
-0.06
ky
-0.06
jane
-0.06
POSITIVE LOGITS
URED
0.08
ести
0.07
allied
0.07
ces
0.06
IA
0.06
us
0.06
deport
0.06
。”↵↵
0.06
stains
0.06
oints
0.06
Activations Density 0.052%