INDEX
Explanations
percentages or numerical values in sentences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1008
+0.08
0.2%
1220
+0.08
0.2%
426
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
426
+0.08
0.05
1008
+0.08
0.04
446
+0.07
0.04
Negative Logits
disgra
-0.81
pessi
-0.77
unil
-0.76
tph
-0.75
aton
-0.74
pamph
-0.74
vagab
-0.73
emphat
-0.73
racon
-0.72
contex
-0.71
POSITIVE LOGITS
percent
0.61
percent
0.56
față
0.52
chance
0.52
reduction
0.51
percentage
0.51
/%
0.50
toată
0.50
setAlignment
0.50
}%
0.49
Activations Density 0.235%