INDEX
Explanations
The neuron fires on adverbs indicating a negative statistical correlation (e.g. “negatively” in “negatively correlated”).
New Auto-Interp
Negative Logits
aine
-0.08
Neighbor
-0.07
사항
-0.07
Owens
-0.07
para
-0.07
sammen
-0.07
Також
-0.07
ooke
-0.07
函数
-0.07
Ban
-0.07
POSITIVE LOGITS
removeAll
0.06
dial
0.06
_nav
0.06
_RANDOM
0.06
inclusive
0.06
查询
0.06
-standing
0.06
getC
0.06
وظ
0.06
Marlins
0.06
Activations Density 0.005%