INDEX
Explanations
descriptions of historical events, locations, and fatalities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.10
0.3%
459
+0.08
0.2%
1314
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1820
+0.10
0.03
7
+0.08
0.03
3
+0.08
0.03
Negative Logits
Simult
-0.99
seiz
-0.90
Hano
-0.88
Manufact
-0.88
reluct
-0.86
unan
-0.86
Voi
-0.86
Arro
-0.83
Langu
-0.83
Epif
-0.83
POSITIVE LOGITS
<bos>
0.99
invokingState
0.56
trajets
0.55
around
0.53
or
0.53
USTAIN
0.53
Trọng
0.53
diğini
0.52
ModelExpression
0.51
ValueStyle
0.50
Activations Density 0.102%