INDEX
Explanations
specific names, including proper nouns and abbreviations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1023
+0.13
0.7%
1271
+0.12
0.6%
1971
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.13
0.05
1343
+0.12
0.05
1741
+0.12
0.00
Negative Logits
<bos>
-1.37
intersper
-1.10
forbear
-0.76
impelled
-0.76
vainly
-0.75
overcrow
-0.74
/**
-0.73
equila
-0.72
-0.71
disbur
-0.70
POSITIVE LOGITS
utop
0.81
cioc
0.74
Tow
0.64
Toxicol
0.63
tke
0.62
ToTensor
0.61
TO
0.60
gmbh
0.60
africain
0.60
télévis
0.59
Activations Density 0.277%