INDEX
Explanations
sciences
The neuron consistently activates on names of academic disciplines or subject‐area titles.
New Auto-Interp
Negative Logits
vens
-0.07
(gen
-0.06
}\\
-0.06
Dust
-0.06
Camp
-0.06
sailors
-0.06
.matmul
-0.06
ellipt
-0.06
["
-0.06
Svg
-0.06
POSITIVE LOGITS
sciences
0.11
Arts
0.11
arts
0.11
Sciences
0.08
senses
0.07
contrary
0.07
Humanities
0.07
Actions
0.07
Highlands
0.07
SCI
0.07
Activations Density 0.019%