INDEX
Explanations
The neuron fires on modal/hedging words that express possibility or uncertainty (e.g. “possible,” “could,” “may,” “might”).
New Auto-Interp
Negative Logits
clos
-0.07
However
-0.06
kw
-0.06
bump
-0.06
Manage
-0.06
However
-0.06
sense
-0.06
Ln
-0.06
_km
-0.06
tourism
-0.06
POSITIVE LOGITS
zih
0.07
Their
0.07
/archive
0.07
الوطني
0.07
Що
0.07
석
0.07
咨
0.07
?s
0.07
ظٹط
0.06
профилакти
0.06
Activations Density 0.026%