INDEX
Explanations
This neuron identifies Wikipedia category labels that denote an athlete’s medal achievements (e.g. “Olympic silver medalists for X”).
New Auto-Interp
Negative Logits
igne
-0.07
чим
-0.06
专业
-0.06
.serial
-0.06
itol
-0.06
φό
-0.06
ськими
-0.06
cano
-0.06
Mary
-0.06
Ronald
-0.06
POSITIVE LOGITS
avere
0.07
delete
0.06
envis
0.06
ORIZATION
0.06
for
0.06
Sugar
0.06
MOTOR
0.06
AREA
0.06
apply
0.06
hỗ
0.06
Activations Density 0.001%