INDEX
Explanations
phrases related to decision-making or organizational structure
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.21
0.8%
889
+0.10
0.4%
1044
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
889
+0.21
0.03
1300
+0.10
0.03
1245
+0.10
0.03
Negative Logits
<bos>
-2.91
/**
-0.77
<?
-0.69
-0.69
harmonize
-0.66
displace
-0.65
disbur
-0.64
coexist
-0.63
expel
-0.62
cooperated
-0.60
POSITIVE LOGITS
jawa
1.08
magis
1.00
venuto
0.95
milano
0.94
gamba
0.89
affez
0.89
napoli
0.89
maroc
0.88
paradiso
0.88
roberto
0.87
Activations Density 0.199%