INDEX
Explanations
references to pop culture
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.15
0.9%
185
+0.11
0.7%
30
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
185
+0.15
0.01
77
+0.11
0.01
351
+0.11
0.02
Negative Logits
inib
-1.74
NHS
-1.63
ynes
-1.50
slightest
-1.44
pylori
-1.44
sepsis
-1.35
)}}{\-1.35
coli
-1.34
ocese
-1.32
itian
-1.32
POSITIVE LOGITS
lite
2.50
ups
2.40
ulating
2.18
corn
2.17
ulates
2.14
ulous
2.04
ulated
2.04
ulations
2.02
ipers
1.87
stars
1.77
Activations Density 0.141%