INDEX
Explanations
phrases related to the quantity or degree of something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1935
+0.09
0.3%
1992
+0.07
0.2%
416
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1935
+0.09
0.03
969
+0.07
0.04
1795
+0.07
0.03
Negative Logits
shenan
-1.69
reluct
-1.65
depic
-1.63
emphat
-1.55
affor
-1.51
increa
-1.50
unspeak
-1.48
disagre
-1.48
inconce
-1.46
intersper
-1.46
POSITIVE LOGITS
too
0.85
too
0.78
Too
0.77
Too
0.76
TOO
0.73
demasiado
0.66
TOO
0.66
quá
0.65
calendriers
0.65
<bos>
0.59
Activations Density 0.304%