INDEX
Explanations
phrases related to comparison or evaluation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1253
+0.09
0.2%
581
+0.09
0.2%
223
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
581
+0.09
0.04
223
+0.09
0.03
584
+0.09
0.03
Negative Logits
dispen
-1.32
fta
-1.31
wien
-1.31
fuf
-1.30
fte
-1.29
accla
-1.29
wherea
-1.28
increa
-1.27
secon
-1.27
volunte
-1.26
POSITIVE LOGITS
<bos>
0.67
what
0.58
jemals
0.56
nor
0.56
how
0.56
Trichloroethane
0.54
GOTREF
0.53
<tfoot>
0.53
انيف
0.52
setToolTip
0.52
Activations Density 0.372%