INDEX
Explanations
terms relating to micro and macro concepts in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.23
1.3%
203
+0.18
1.0%
391
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
203
+0.23
0.02
393
+0.18
0.02
391
+0.12
0.01
Negative Logits
msgid
-1.73
willingness
-1.57
rapeut
-1.55
juvant
-1.52
arer
-1.51
ocarcinoma
-1.47
andidates
-1.46
herty
-1.45
REAM
-1.43
UTF
-1.42
POSITIVE LOGITS
cles
1.76
ħ
1.52
screens
1.51
Ĵ
1.49
artifacts
1.48
chip
1.45
Ħ
1.45
(\~
1.44
artifact
1.41
chips
1.40
Activations Density 0.061%