INDEX
Explanations
the word "or" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.41
1.7%
1472
+0.12
0.5%
1842
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1806
+0.41
0.09
1472
+0.12
0.09
1127
+0.11
0.08
Negative Logits
<bos>
-1.85
SEDS
-0.82
FunctionFlags
-0.70
//////////////
-0.64
UTF
-0.63
apnews
-0.62
pola
-0.61
***!
-0.61
////////////
-0.61
Denomin
-0.60
POSITIVE LOGITS
disagre
1.74
shenan
1.64
gaily
1.64
milf
1.63
apprehen
1.63
maneu
1.62
accla
1.55
impra
1.54
increa
1.52
reluct
1.51
Activations Density 0.303%