INDEX
Explanations
This neuron detects mentions of “sources” (i.e. references to information sources).
New Auto-Interp
Negative Logits
Taliban
-0.08
Sale
-0.07
(tab
-0.07
Pamela
-0.07
uye
-0.07
polygons
-0.06
७
-0.06
ruba
-0.06
Parkway
-0.06
설
-0.06
POSITIVE LOGITS
sources
0.10
spies
0.08
source
0.08
Sources
0.06
oleh
0.06
outlets
0.06
officials
0.06
PLATFORM
0.06
ears
0.06
resources
0.06
Activations Density 0.013%