INDEX
Explanations
words related to education and self-awareness
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.14
0.4%
690
+0.13
0.4%
1042
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
490
+0.14
0.03
1265
+0.13
0.04
870
+0.11
0.04
Negative Logits
dises
-1.16
sopr
-1.15
sappi
-1.14
robus
-1.07
lancia
-1.06
pessi
-1.06
igno
-1.05
squa
-1.05
Kategor
-1.03
umo
-1.02
POSITIVE LOGITS
there
0.81
it
0.80
they
0.78
then
0.77
you
0.72
we
0.69
chances
0.67
or
0.65
there
0.64
she
0.63
Activations Density 0.208%