INDEX
Explanations
references to specific numerical values or quantities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
436
+0.12
0.7%
247
+0.11
0.7%
214
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
436
+0.12
0.02
214
+0.11
0.02
499
+0.11
0.02
Negative Logits
usterity
-1.84
elian
-1.71
uppose
-1.63
asting
-1.54
yours
-1.53
anning
-1.50
rely
-1.46
gonna
-1.43
MERCHANTABILITY
-1.42
asts
-1.41
POSITIVE LOGITS
ÅĽÄĩ
1.89
naire
1.87
sky
1.79
Minn
1.77
ģ
1.72
ÅĽci
1.69
ska
1.69
®
1.62
ström
1.60
face
1.59
Activations Density 0.052%