INDEX
Explanations
terms related to real-world scenarios and skills
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1589
+0.11
0.3%
437
+0.10
0.3%
1013
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1438
+0.11
0.03
437
+0.10
0.02
58
+0.08
0.02
Negative Logits
?...
-1.13
shenan
-1.01
emphat
-1.00
milf
-0.98
alre
-0.95
madonna
-0.94
benevol
-0.94
accla
-0.92
encomp
-0.91
fuf
-0.90
POSITIVE LOGITS
real
0.93
Real
0.91
Real
0.88
real
0.87
REAL
0.86
getReal
0.83
Actual
0.82
actual
0.80
REAL
0.79
actual
0.79
Activations Density 0.322%