INDEX
Explanations
phrases related to learning lessons and experiences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
30
+0.09
0.3%
892
+0.09
0.2%
401
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
401
+0.09
0.03
30
+0.09
0.03
1030
+0.08
0.02
Negative Logits
casio
-0.51
unwarran
-0.49
Intere
-0.46
Etat
-0.46
afp
-0.45
cytoplas
-0.45
explication
-0.45
Réalisation
-0.44
logitech
-0.44
apprehen
-0.43
POSITIVE LOGITS
lessons
0.92
learned
0.91
learn
0.91
lesson
0.87
learnt
0.84
learning
0.83
Learned
0.82
LEARN
0.82
learn
0.81
learns
0.81
Activations Density 0.170%