INDEX
Explanations
phrases expressing collective actions or sentiments
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.7%
1535
+0.08
0.3%
1870
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
478
+0.17
0.04
444
+0.08
0.04
872
+0.07
0.04
Negative Logits
<bos>
-1.81
ⓧ
-1.05
<?
-0.99
<?
-0.97
-0.91
/***
-0.86
/**
-0.75
///**
-0.67
disbur
-0.66
springfox
-0.66
POSITIVE LOGITS
véhic
0.80
cartier
0.73
soulign
0.68
nastics
0.66
marea
0.66
monté
0.64
plong
0.60
expériment
0.59
ados
0.59
nécess
0.59
Activations Density 0.229%