INDEX
Explanations
coding-related terms and phrases, particularly related to programming languages and instructions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.15
0.6%
169
+0.11
0.4%
1624
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
169
+0.15
0.02
168
+0.11
0.02
1942
+0.11
0.02
Negative Logits
indestru
-0.77
depic
-0.75
shenan
-0.74
contex
-0.74
sophistic
-0.68
viciss
-0.68
disambigu
-0.67
wikihow
-0.67
racon
-0.67
berea
-0.66
POSITIVE LOGITS
programming
1.40
Programming
1.21
programming
1.18
Programming
1.16
programmer
1.03
programmers
0.98
programmed
0.92
programmed
0.88
program
0.87
PROGRAM
0.87
Activations Density 0.054%