INDEX
Explanations
mentions of specific individuals or entities with significant influence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
252
+0.13
0.7%
434
+0.12
0.6%
213
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.13
0.05
271
+0.12
-0.01
213
+0.11
0.03
Negative Logits
Ĥ¬
-2.04
ĥ½
-1.82
roads
-1.75
ments
-1.68
orses
-1.66
mente
-1.61
oscopy
-1.60
teenth
-1.57
ties
-1.54
ĨĴ
-1.51
POSITIVE LOGITS
pert
1.93
pering
1.70
omo
1.64
isine
1.64
assium
1.60
goodness
1.60
per
1.57
yler
1.53
tah
1.53
enta
1.52
Activations Density 0.390%