INDEX
Explanations
phrases related to saving or preservation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1515
+0.13
0.5%
596
+0.13
0.5%
1527
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
596
+0.13
0.03
1562
+0.13
0.04
1515
+0.11
0.03
Negative Logits
Villar
-0.46
MARIA
-0.45
Phila
-0.44
Ҳ
-0.42
polski
-0.42
psycopg
-0.42
Kond
-0.42
Wod
-0.42
trouvera
-0.42
bbero
-0.41
POSITIVE LOGITS
save
1.14
saves
1.10
saving
1.08
saved
1.07
SAVE
1.05
Save
1.04
Saving
1.03
savings
1.02
Save
1.00
save
1.00
Activations Density 0.078%