INDEX
Explanations
absolute terms of certainty and knowledge
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.14
0.4%
1482
+0.13
0.4%
554
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1482
+0.14
0.04
991
+0.13
0.03
1512
+0.10
0.03
Negative Logits
indestru
-0.78
accla
-0.75
reluct
-0.75
indescri
-0.74
intrigu
-0.71
nobly
-0.69
inconce
-0.67
gaily
-0.66
strto
-0.66
unspeak
-0.66
POSITIVE LOGITS
<bos>
0.93
fidanz
0.89
siquiera
0.84
even
0.71
even
0.69
EVEN
0.69
Même
0.67
EVEN
0.61
cammin
0.59
parteci
0.59
Activations Density 0.118%