INDEX
Explanations
locations or events where different activities are happening
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.24
1.2%
866
+0.11
0.5%
1671
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
25
+0.24
0.03
663
+0.11
0.02
156
+0.10
0.02
Negative Logits
<bos>
-3.08
var
-0.75
public
-0.74
ByVersion
-0.72
rungsseite
-0.71
long
-0.70
if
-0.70
typelib
-0.69
//
-0.68
/**
-0.68
POSITIVE LOGITS
disagre
2.23
affor
2.19
milf
2.17
maneu
2.16
ftu
2.15
reluct
2.13
fortn
2.12
strick
2.10
excru
2.06
increa
2.06
Activations Density 0.178%