INDEX
Explanations
This neuron activates on repeated occurrences of the word “same.”
New Auto-Interp
Negative Logits
moduleId
-0.07
Shipping
-0.06
втра
-0.06
Hire
-0.06
Such
-0.06
KR
-0.06
-testid
-0.06
creen
-0.06
useDispatch
-0.06
уляр
-0.06
POSITIVE LOGITS
axle
0.07
cp
0.06
uč
0.06
ग
0.06
θρώ
0.06
ován
0.06
простран
0.06
بم
0.06
islav
0.06
_defaults
0.06
Activations Density 0.006%