INDEX
Explanations
pork and porcine
This neuron activates on references to pigs or pig‐related terms (e.g., “porcine,” “swine,” “pig”).
New Auto-Interp
Negative Logits
�
-0.06
"])↵↵
-0.06
""" ↵
-0.06
trustees
-0.06
=============↵
-0.06
الصن
-0.06
传
-0.06
genie
-0.06
εμπ
-0.05
[[]
-0.05
POSITIVE LOGITS
pork
0.08
SDS
0.08
Pork
0.07
defer
0.07
Democr
0.07
Yani
0.07
shaled
0.07
ạc
0.07
ンズ
0.07
Penguins
0.07
Activations Density 0.006%