INDEX
Explanations
mentions of the word "plus" along with numerical values or symbols
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1637
+0.16
0.6%
1387
+0.14
0.6%
281
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1637
+0.16
0.03
1387
+0.14
0.02
281
+0.14
0.03
Negative Logits
agliari
-0.58
acido
-0.55
barri
-0.54
iman
-0.54
]='\
-0.54
kön
-0.54
fides
-0.53
mariana
-0.53
vernac
-0.52
محفوظة
-0.51
POSITIVE LOGITS
Plus
0.98
Plus
0.96
plus
0.96
plus
0.93
PLUS
0.88
PLUS
0.84
affez
0.69
minus
0.65
statunit
0.63
prépare
0.62
Activations Density 0.059%