INDEX
Explanations
arguments and debate
This neuron detects the words “straw” and “man,” i.e. occurrences of the phrase “straw man.”
New Auto-Interp
Negative Logits
查看
-0.07
screenshots
-0.07
//================================================
-0.06
자동
-0.06
flotation
-0.06
.Threading
-0.06
Identity
-0.06
چشم
-0.06
ritional
-0.06
('../-0.06
POSITIVE LOGITS
uem
0.07
�
0.06
ges
0.06
ável
0.06
burglary
0.06
sigu
0.06
constitu
0.06
paren
0.06
ATORY
0.06
rb
0.06
Activations Density 0.056%