INDEX
Explanations
comparisons or confrontations between entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
507
+0.11
0.3%
1441
+0.09
0.3%
1314
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1441
+0.11
0.05
507
+0.09
0.04
1409
+0.09
0.02
Negative Logits
nece
-0.65
dichi
-0.65
apparti
-0.60
parteci
-0.58
unwarran
-0.56
adh
-0.56
sopr
-0.55
dovr
-0.55
aen
-0.55
aspetta
-0.54
POSITIVE LOGITS
rivalry
0.90
vying
0.86
competing
0.80
competition
0.79
rival
0.78
rivals
0.76
showdown
0.75
battle
0.74
duel
0.74
competitors
0.71
Activations Density 0.653%