INDEX
Explanations
quotes from different individuals in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.21
0.6%
889
+0.10
0.3%
1068
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1150
+0.21
0.04
889
+0.10
0.05
395
+0.08
0.05
Negative Logits
swarovski
-1.31
murano
-1.20
dispen
-1.16
effe
-1.14
vespa
-1.12
NOO
-1.11
increa
-1.10
desir
-1.10
cabrio
-1.10
erec
-1.10
POSITIVE LOGITS
said
0.77
said
0.75
.
0.66
Said
0.61
.”
0.60
says
0.59
Tuesday
0.57
during
0.57
Said
0.57
saying
0.57
Activations Density 0.127%