INDEX
Explanations
presumed
This neuron activates on words ending in the suffix “-ed.”
New Auto-Interp
Negative Logits
filling
-0.07
MainThread
-0.06
город
-0.06
系
-0.06
(vars
-0.06
eck
-0.06
_activation
-0.06
touches
-0.06
женщин
-0.06
Şehir
-0.06
POSITIVE LOGITS
uras
0.07
ují
0.06
蔵
0.06
uckle
0.06
backs
0.06
случ
0.06
$#
0.06
뮤
0.06
ood
0.06
ADING
0.06
Activations Density 0.433%