INDEX
Explanations
phrases indicating advice or caution
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
1.3%
1678
+0.10
0.6%
1124
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1124
+0.23
0.07
776
+0.10
0.07
61
+0.10
0.07
Negative Logits
<bos>
-3.18
<?
-0.73
/***
-0.67
USTAIN
-0.61
<tfoot>
-0.60
consolidate
-0.56
bestow
-0.55
/*
-0.54
onStop
-0.54
ⓧ
-0.54
POSITIVE LOGITS
One
0.96
One
0.94
ONE
0.93
santiago
0.92
gabri
0.90
sergio
0.88
ONE
0.88
lidl
0.85
maroc
0.84
véhic
0.83
Activations Density 0.220%