INDEX
Explanations
phrases related to physical measurements like height, width, and depth
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
544
+0.14
0.5%
75
+0.14
0.5%
871
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1218
+0.14
0.04
75
+0.14
0.03
544
+0.12
0.04
Negative Logits
indestru
-1.35
reluct
-1.33
accla
-1.32
intersper
-1.29
shenan
-1.25
maneu
-1.25
philanth
-1.24
encomp
-1.22
increa
-1.19
inev
-1.19
POSITIVE LOGITS
width
0.86
depth
0.84
width
0.79
Width
0.77
depth
0.74
height
0.72
density
0.70
height
0.69
thickness
0.68
дописавши
0.66
Activations Density 0.162%