INDEX
Explanations
references to locations or places in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.17
1.0%
348
+0.13
0.7%
245
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
181
+0.17
0.01
243
+0.13
0.01
245
+0.12
0.01
Negative Logits
adults
-1.55
#{-1.47
respectively
-1.46
possible
-1.46
cross
-1.43
?_
-1.42
moderation
-1.42
straight
-1.42
independence
-1.39
?"
-1.38
POSITIVE LOGITS
holder
2.27
aho
1.98
bel
1.94
bum
1.92
vist
1.87
plates
1.86
veolar
1.83
pit
1.83
endor
1.80
ente
1.79
Activations Density 0.048%