INDEX
Explanations
occurrences of specific programming syntax and commands
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.22
1.3%
23
+0.17
0.9%
53
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
53
+0.22
0.03
278
+0.17
0.02
167
+0.14
0.02
Negative Logits
ols
-1.93
oltz
-1.69
olid
-1.62
lein
-1.62
olver
-1.57
ÃŃf
-1.53
(âĪĴ
-1.49
bserver
-1.49
ublin
-1.48
antage
-1.47
POSITIVE LOGITS
hell
1.85
pity
1.72
fuck
1.66
thick
1.65
shit
1.62
heck
1.60
liar
1.55
thicker
1.52
bitch
1.49
...)
1.46
Activations Density 5.401%