INDEX
Explanations
references to actions or instructions, particularly related to reducing something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.22
1.3%
544
+0.11
0.6%
1103
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
544
+0.22
0.03
596
+0.11
0.03
1480
+0.10
0.03
Negative Logits
<bos>
-3.12
<?
-0.97
/**
-0.94
ⓧ
-0.88
/***
-0.83
-0.81
///**
-0.73
<?
-0.67
/*
-0.66
//---
-0.59
POSITIVE LOGITS
kasa
1.34
lele
1.34
bandung
1.29
jaya
1.25
saar
1.20
jati
1.20
Minang
1.19
emphat
1.17
hina
1.17
ftu
1.16
Activations Density 0.177%