INDEX
Explanations
This neuron detects occurrences of the abbreviation “U.S.” (the United States).
New Auto-Interp
Negative Logits
Lara
-0.06
้ต
-0.06
onomy
-0.06
řad
-0.06
quiero
-0.06
�
-0.06
каж
-0.06
Que
-0.06
area
-0.06
where
-0.06
POSITIVE LOGITS
.S
0.08
.Addr
0.08
медицин
0.07
SEC
0.07
5
0.07
emiz
0.06
trăm
0.06
ERICAN
0.06
[D
0.06
'S
0.06
Activations Density 0.009%