INDEX
Explanations
phrases related to decision-making and comparisons
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.11
0.3%
382
+0.11
0.3%
1688
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.11
0.06
106
+0.11
0.04
1688
+0.10
0.04
Negative Logits
seiz
-1.08
meis
-1.06
dises
-1.04
abnorm
-1.01
erec
-1.01
aen
-1.00
lele
-0.98
sappi
-0.98
mef
-0.95
teras
-0.95
POSITIVE LOGITS
Both
0.87
Both
0.85
both
0.79
both
0.78
similarities
0.75
similar
0.70
similarity
0.70
alike
0.67
similarly
0.66
BOTH
0.62
Activations Density 0.465%