INDEX
Explanations
The neuron is spotting hedging or uncertainty cues—words and phrases that flag experimental, tentative, or “we don’t know” language.
New Auto-Interp
Negative Logits
tone
-0.07
oats
-0.07
cap
-0.07
posites
-0.06
挂
-0.06
Machine
-0.06
ManyToOne
-0.06
expectancy
-0.06
.string
-0.06
งส
-0.06
POSITIVE LOGITS
农业
0.07
uncate
0.06
体育
0.06
.GetSize
0.06
coun
0.06
_ARGUMENT
0.06
approximately
0.06
дина
0.06
olucion
0.06
orph
0.06
Activations Density 0.102%