INDEX
Explanations
mentions of trends in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
1.0%
410
+0.11
0.6%
1870
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
214
+0.17
0.03
30
+0.11
0.03
410
+0.11
0.02
Negative Logits
<bos>
-3.44
ⓧ
-1.11
public
-0.79
/***
-0.77
/*
-0.76
-0.76
<?
-0.72
protected
-0.69
via
-0.66
put
-0.66
POSITIVE LOGITS
affor
1.84
bandung
1.80
maneu
1.78
Minang
1.76
increa
1.74
strick
1.66
jaya
1.66
Khart
1.64
guarante
1.62
reluct
1.61
Activations Density 0.090%