INDEX
Explanations
The neuron activates on tokens that describe machine‐learning model training or usage actions.
New Auto-Interp
Negative Logits
-service
-0.07
-long
-0.06
Bit
-0.06
erection
-0.06
esktop
-0.06
302
-0.06
277
-0.06
copyright
-0.06
_layout
-0.06
.train
-0.06
POSITIVE LOGITS
ورز
0.07
haline
0.07
entfer
0.06
名無し
0.06
_MINOR
0.06
íveis
0.06
Lean
0.06
fulWidget
0.06
why
0.06
+"</
0.06
Activations Density 0.046%