INDEX
Explanations
This neuron activates on occurrences of the substring “intrusive” (e.g. the “usive” part in “intrusive”).
New Auto-Interp
Negative Logits
起来
-0.06
733
-0.06
monks
-0.06
_closure
-0.06
anc
-0.06
戻
-0.05
そうな
-0.05
cyan
-0.05
орів
-0.05
/////
-0.05
POSITIVE LOGITS
tex
0.07
ease
0.07
oad
0.07
uper
0.06
イン
0.06
_MAX
0.06
);}↵↵
0.06
σσ
0.06
/ex
0.06
маг
0.06
Activations Density 0.000%