INDEX
Explanations
phrases that express personal feelings or reactions to a situation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.15
0.5%
1370
+0.09
0.3%
528
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1261
+0.15
0.05
1142
+0.09
0.04
584
+0.08
0.04
Negative Logits
<bos>
-2.24
ⓧ
-0.71
-0.70
/***
-0.64
<?
-0.63
/**
-0.56
Проце
-0.55
És
-0.52
})();
-0.52
#
-0.49
POSITIVE LOGITS
ughter
1.06
sappi
0.97
perpétu
0.93
mezza
0.91
actionTypes
0.90
stockholm
0.89
riva
0.88
déput
0.87
gabri
0.87
lijah
0.87
Activations Density 0.646%