INDEX
Explanations
repetitive phrases or constructs starting with "on"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.18
1.0%
69
+0.18
1.0%
47
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
69
+0.18
0.11
328
+0.18
0.07
243
+0.14
0.06
Negative Logits
hell
-1.67
wise
-1.63
pockets
-1.58
ishly
-1.53
esh
-1.49
betting
-1.41
oud
-1.40
logo
-1.37
Pradesh
-1.37
urally
-1.35
POSITIVE LOGITS
ĻĤ
4.98
Ļª
4.48
«
4.22
ĸ´
4.21
¿½
4.17
ĨĴ
4.14
ı
4.00
Ĭ
3.95
Īĺ
3.90
ĥ
3.86
Activations Density 0.284%