INDEX
Explanations
phrases indicating a cause and effect relationship
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1757
+0.11
0.3%
161
+0.08
0.2%
1526
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.11
0.03
283
+0.08
0.01
1664
+0.07
0.02
Negative Logits
<bos>
-0.98
/**
-0.59
ⓧ
-0.58
push
-0.52
寄
-0.52
-0.50
увиде
-0.50
/*!
-0.49
leşti
-0.49
pushed
-0.48
POSITIVE LOGITS
uhr
1.44
Minang
1.40
saar
1.36
thereby
1.35
maksi
1.30
seksi
1.27
Meksi
1.26
keramik
1.25
lemp
1.25
Strukt
1.24
Activations Density 0.270%