INDEX
Explanations
elements related to judgments or assessments of quality
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
464
+0.11
0.6%
427
+0.11
0.6%
421
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
203
+0.11
0.36
23
+0.11
0.31
494
+0.11
0.25
Negative Logits
accustomed
-1.70
ij
-1.57
atan
-1.56
ols
-1.52
ĸ
-1.49
otry
-1.48
ķ
-1.43
Pal
-1.37
forma
-1.32
cour
-1.32
POSITIVE LOGITS
dered
1.60
dern
1.54
itely
1.50
onian
1.45
TRODUCTION
1.43
orkshire
1.43
ively
1.42
izumab
1.42
uto
1.39
rd
1.39
Activations Density 3.076%