INDEX
Explanations
adjectives and relational terms like names and roles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1036
+0.09
0.3%
24
+0.09
0.2%
1043
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.09
0.05
24
+0.09
0.05
1662
+0.08
0.05
Negative Logits
increa
-1.51
guarante
-1.50
inev
-1.43
affor
-1.41
secon
-1.35
unden
-1.34
emphat
-1.33
embra
-1.32
strick
-1.31
perfet
-1.31
POSITIVE LOGITS
accordingly
0.76
changing
0.68
into
0.65
according
0.64
to
0.64
changed
0.64
GraphicsUnit
0.64
PerformLayout
0.63
changing
0.63
RTDA
0.63
Activations Density 0.406%