INDEX
Explanations
mentions of academic institutions and educational programs
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.12
0.3%
1356
+0.10
0.3%
198
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1793
+0.12
0.05
1627
+0.10
0.05
1870
+0.09
0.03
Negative Logits
relenting
-0.70
confé
-0.69
تضيفلها
-0.65
vainqueur
-0.63
vôtre
-0.61
mistak
-0.61
déput
-0.60
marié
-0.60
malade
-0.58
goutte
-0.57
POSITIVE LOGITS
0.60
tldr
0.59
MOQ
0.58
Throwaway
0.58
°;
0.57
ragion
0.57
<?
0.56
Lma
0.56
ºC
0.55
Ferner
0.55
Activations Density 0.489%