INDEX
Explanations
links and references related to a specific platform
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
421
+0.14
0.6%
976
+0.14
0.6%
101
+0.13
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
976
+0.14
0.02
101
+0.14
0.02
421
+0.13
0.02
Negative Logits
vī
-0.62
kef
-0.55
mī
-0.50
dismant
-0.49
Dory
-0.49
vorrei
-0.47
scoffed
-0.46
neutralized
-0.46
Mercure
-0.45
Paradiso
-0.45
POSITIVE LOGITS
Steam
1.45
Steam
1.41
steam
1.41
steam
1.39
STEAM
1.19
steamed
0.93
steaming
0.86
steamer
0.85
vapeur
0.82
étu
0.73
Activations Density 0.072%