INDEX
Explanations
phrases related to instructions and processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.9%
506
+0.11
0.6%
281
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1451
+0.17
0.03
281
+0.11
0.03
966
+0.11
0.03
Negative Logits
<bos>
-3.61
/***
-0.76
-0.72
ⓧ
-0.66
resourceCulture
-0.65
contentLoaded
-0.65
elemField
-0.60
mit
-0.60
виправи
-0.60
близь
-0.59
POSITIVE LOGITS
affor
1.72
maneu
1.71
milano
1.71
frankfurt
1.66
stockholm
1.64
napoli
1.57
reluct
1.56
maroc
1.54
lidl
1.54
bandung
1.52
Activations Density 0.064%