INDEX
Explanations
expressions of significant emphasis or urgency
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.26
1.5%
478
+0.20
1.2%
203
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
294
+0.26
0.02
307
+0.20
0.02
278
+0.12
0.02
Negative Logits
ista
-1.85
alike
-1.67
mare
-1.62
iast
-1.61
icio
-1.59
mates
-1.59
,
-1.58
acia
-1.57
iazep
-1.54
iste
-1.53
POSITIVE LOGITS
ĥ½
3.26
ı
2.76
Ļª
2.73
į
2.61
Į
2.52
ĩ
2.48
º
2.46
ĻĤ
2.42
Ĩ
2.39
´
2.39
Activations Density 0.702%