INDEX
Explanations
elements related to formatting in HTML code
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.16
0.5%
1343
+0.15
0.5%
964
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1780
+0.16
0.02
876
+0.15
0.00
453
+0.15
0.03
Negative Logits
unspeak
-1.52
philanth
-1.48
apprehen
-1.36
pamph
-1.36
volunte
-1.34
indescri
-1.34
endeavouring
-1.30
disagre
-1.29
practition
-1.29
reluct
-1.29
POSITIVE LOGITS
giu
0.83
edì
0.79
piacere
0.73
ù
0.73
riso
0.73
artistico
0.72
dì
0.72
Traduction
0.72
morire
0.71
uteurs
0.71
Activations Density 0.092%