INDEX
Explanations
phrases related to capabilities or abilities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.15
0.8%
1806
+0.11
0.6%
1047
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1256
+0.15
0.04
1806
+0.11
0.04
1047
+0.11
0.04
Negative Logits
<bos>
-3.26
/***
-0.84
/**
-0.83
intersper
-0.80
ⓧ
-0.79
-0.79
<?
-0.76
harmonize
-0.67
endow
-0.64
banish
-0.62
POSITIVE LOGITS
ananas
1.00
thuy
0.99
kafe
0.98
saar
0.98
cannes
0.97
kasa
0.95
seksi
0.95
bandung
0.93
maroc
0.93
jawa
0.92
Activations Density 0.105%