INDEX
Explanations
sections of text related to software licensing and legal disclaimers
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
263
+0.11
0.7%
427
+0.10
0.7%
157
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
427
+0.11
0.01
230
+0.10
0.01
499
+0.10
0.01
Negative Logits
Ĺ
-3.64
Ŀ
-3.52
·¸
-3.45
ķ
-3.43
¯
-3.42
ĻĤ
-3.42
¨
-3.39
¬
-3.29
Ļª
-3.29
Ĭ
-3.26
POSITIVE LOGITS
framework
1.59
Agreement
1.57
waiver
1.57
ierno
1.55
roup
1.53
UMENT
1.52
umab
1.47
Statement
1.42
ár
1.41
ISM
1.40
Activations Density 0.029%