INDEX
Explanations
terms related to concepts of quality or categorization in context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
0.8%
1842
+0.11
0.4%
1385
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1385
+0.23
0.10
1870
+0.11
0.04
678
+0.10
0.07
Negative Logits
<bos>
-2.43
consultato
-0.87
+#+#
-0.82
gyhoeddwyd
-0.79
,
-0.78
ValueGenerated
-0.77
rungsseite
-0.76
DebuggerNonUser
-0.75
rrggbb
-0.73
、
-0.72
POSITIVE LOGITS
aen
1.99
emphat
1.98
napoli
1.97
Minang
1.91
affor
1.88
fta
1.88
Juf
1.84
accla
1.82
suspic
1.82
inev
1.82
Activations Density 0.698%