INDEX
Explanations
The neuron activates on the term “Direct,” particularly when it appears as the leading word in technical headings or titles.
New Auto-Interp
Negative Logits
Baum
-0.07
lun
-0.06
anan
-0.06
Wow
-0.06
�
-0.06
ampaign
-0.06
enthusiasm
-0.06
стоя
-0.06
ảm
-0.06
Yao
-0.06
POSITIVE LOGITS
Direct
0.12
Direct
0.12
direct
0.11
direct
0.09
secret
0.08
přím
0.08
-direct
0.08
RT
0.08
Quick
0.07
duct
0.07
Activations Density 0.013%