INDEX
Explanations
personal pronouns and verbs related to actions and behaviors
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.15
0.5%
50
+0.15
0.5%
1415
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.15
0.07
1919
+0.15
0.09
1510
+0.13
0.06
Negative Logits
tremb
-1.59
Mlle
-1.53
Keny
-1.43
Juf
-1.43
bordeaux
-1.41
Abbé
-1.41
napoli
-1.40
ibiza
-1.40
cannes
-1.38
tranquillo
-1.37
POSITIVE LOGITS
seek
0.85
strive
0.84
utilize
0.84
create
0.84
rely
0.83
provide
0.82
do
0.81
ensure
0.78
apply
0.77
give
0.76
Activations Density 0.507%