INDEX
Explanations
contrasts or differences between concepts or ideas
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1276
+0.13
0.4%
555
+0.10
0.3%
1506
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1276
+0.13
0.04
555
+0.10
0.02
1677
+0.09
0.03
Negative Logits
lele
-1.06
mef
-0.94
fei
-0.90
fta
-0.90
meis
-0.90
myn
-0.88
afo
-0.87
paff
-0.87
wien
-0.86
lara
-0.86
POSITIVE LOGITS
merely
0.67
nevertheless
0.60
而是
0.59
simply
0.58
rather
0.57
nonetheless
0.56
instead
0.56
upné
0.53
actually
0.53
yet
0.52
Activations Density 0.115%