INDEX
Explanations
descriptive adjectives that characterize intensity or size
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.24
1.0%
156
+0.12
0.5%
605
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
605
+0.24
0.04
642
+0.12
0.05
393
+0.09
0.04
Negative Logits
<bos>
-2.33
/***
-0.96
///**
-0.90
defray
-0.78
ratify
-0.76
avrebbero
-0.76
endow
-0.74
intersper
-0.73
ⓧ
-0.71
<?
-0.70
POSITIVE LOGITS
asado
0.80
hematical
0.80
ados
0.76
vinci
0.74
maroc
0.73
GRAPHS
0.71
quoc
0.71
lindo
0.70
cuit
0.70
mistak
0.70
Activations Density 0.336%