INDEX
Explanations
words related to technology, computing, and financial terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1467
+0.09
0.2%
1870
+0.07
0.2%
298
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
471
+0.09
0.02
1282
+0.07
0.02
912
+0.07
0.03
Negative Logits
mef
-1.22
aen
-1.19
inev
-1.17
inder
-1.17
wien
-1.14
fuf
-1.14
Juf
-1.13
fte
-1.11
fta
-1.11
mme
-1.09
POSITIVE LOGITS
easier
0.79
safer
0.70
шибка
0.69
alive
0.67
become
0.64
impossible
0.63
harder
0.62
profitable
0.62
accessible
0.62
make
0.61
Activations Density 0.228%