INDEX
Explanations
official terms or designations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.10
0.4%
678
+0.04
0.2%
1506
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
413
+0.10
0.04
185
+0.04
0.04
1792
+0.04
0.04
Negative Logits
<bos>
-1.50
-0.80
ⓧ
-0.77
/***
-0.77
/*++
-0.69
/*
-0.69
/**
-0.69
///**
-0.66
<?
-0.66
}{||-0.66
POSITIVE LOGITS
official
1.96
Official
1.93
Official
1.82
OFFICIAL
1.78
official
1.74
affor
1.60
maneu
1.56
Officially
1.55
OFFICIAL
1.51
Juf
1.47
Activations Density 0.128%