INDEX
Explanations
actions or verbs related to undoing or removing something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.21
0.8%
1218
+0.09
0.4%
1480
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
946
+0.21
0.05
1328
+0.09
0.06
1218
+0.09
0.05
Negative Logits
<bos>
-2.80
/***
-0.79
<?
-0.78
ⓧ
-0.74
<?
-0.69
-0.69
/**
-0.68
//*/
-0.63
//---
-0.63
incarcer
-0.59
POSITIVE LOGITS
lele
1.34
bandung
1.23
jawa
1.22
Minang
1.19
jati
1.16
magis
1.10
jaya
1.04
saar
1.00
riva
1.00
kaos
0.99
Activations Density 0.832%