INDEX
Explanations
steps or instructions in a technical or mechanical process
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1385
+0.13
0.4%
906
+0.08
0.2%
1150
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
736
+0.13
0.05
724
+0.08
0.04
335
+0.07
0.03
Negative Logits
pessi
-1.11
wien
-1.03
strick
-1.02
emphat
-1.02
accla
-1.01
madonna
-0.99
ardu
-0.97
bayern
-0.97
stockholm
-0.96
errone
-0.95
POSITIVE LOGITS
underneath
0.82
underlying
0.75
reveal
0.67
beneath
0.65
behind
0.64
inside
0.63
revealing
0.61
revealed
0.61
underlying
0.58
hidden
0.58
Activations Density 0.364%