INDEX
Explanations
The neuron activates on negation words (e.g., “not”).
New Auto-Interp
Negative Logits
داو
-0.06
Deprecated
-0.06
smoothly
-0.06
;y
-0.06
?option
-0.06
notifies
-0.06
arı
-0.06
ếu
-0.06
uestos
-0.06
numberOfRows
-0.06
POSITIVE LOGITS
Nine
0.07
.Master
0.07
Terra
0.07
ejac
0.06
关系
0.06
eclectic
0.06
соп
0.06
哥
0.06
уз
0.06
_fname
0.06
Activations Density 0.005%