INDEX
Explanations
The neuron activates on words and phrases that indicate inclusion or location—e.g. “in,” “included,” “incorporated.”
New Auto-Interp
Negative Logits
Temple
-0.08
standpoint
-0.07
Township
-0.06
Sanctuary
-0.06
彩票
-0.06
MetroFramework
-0.06
尖
-0.06
Desktop
-0.06
conquered
-0.06
Trap
-0.06
POSITIVE LOGITS
kesin
0.09
неболь
0.08
included
0.08
incred
0.08
inform
0.07
количе
0.07
arrang
0.07
reck
0.07
enlight
0.07
includes
0.07
Activations Density 0.019%