INDEX
Explanations
punctuation marks, particularly commas
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.28
1.5%
1757
+0.14
0.7%
1741
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.28
0.07
1757
+0.14
0.05
382
+0.12
0.05
Negative Logits
<bos>
-2.76
ⓧ
-1.38
-1.12
<?
-1.05
<?
-0.96
/***
-0.84
/**
-0.82
springfox
-0.74
},[])
-0.70
//});
-0.69
POSITIVE LOGITS
unspeak
0.68
maneu
0.61
impra
0.60
iirc
0.57
indescri
0.57
Abuse
0.56
unexplo
0.54
pleins
0.53
véhic
0.53
beverly
0.52
Activations Density 0.296%