INDEX
Explanations
references to locations, historical events or figures with specific details and dates
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
479
+0.13
0.6%
517
+0.12
0.6%
1363
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.13
0.06
1741
+0.12
-0.01
478
+0.12
0.02
Negative Logits
<bos>
-1.16
дописавши
-0.80
dataclass
-0.77
RegistryLite
-0.74
舟
-0.69
ӣ
-0.69
<0xE8>
-0.67
dire
-0.66
四方
-0.66
execSQL
-0.65
POSITIVE LOGITS
unspeak
1.80
apprehen
1.71
disagre
1.64
tolerably
1.61
ecru
1.58
gaily
1.57
depic
1.57
reluct
1.56
inconce
1.56
increa
1.56
Activations Density 0.565%