INDEX
Explanations
references to a specific rating system or score
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.16
0.9%
369
+0.13
0.8%
59
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
25
+0.16
0.01
49
+0.13
0.01
213
+0.13
0.01
Negative Logits
eing
-1.70
nature
-1.66
cknowled
-1.65
ance
-1.49
logarithm
-1.46
effectiveness
-1.46
sudden
-1.45
logar
-1.45
_________
-1.42
ferences
-1.40
POSITIVE LOGITS
ios
2.36
aku
1.80
cliffe
1.80
iga
1.69
chet
1.69
fire
1.68
keeper
1.67
ég
1.64
ató
1.62
nat
1.61
Activations Density 0.125%