INDEX
Explanations
instances of the verb "write" in various forms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.34
1.5%
755
+0.15
0.7%
1350
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
755
+0.34
0.07
220
+0.15
0.06
1350
+0.11
0.05
Negative Logits
<bos>
-2.01
<?
-0.66
ⓧ
-0.64
dom
-0.61
nad
-0.60
stamo
-0.59
ev
-0.58
Traducción
-0.57
ProgressHUD
-0.56
ട
-0.55
POSITIVE LOGITS
maneu
1.75
Minang
1.63
impra
1.61
unspeak
1.56
increa
1.56
disagre
1.54
disreg
1.53
reluct
1.52
apprehen
1.51
disgra
1.50
Activations Density 0.156%