INDEX
Explanations
sentences about giving or asking for money
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.11
0.3%
1332
+0.09
0.2%
380
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1786
+0.11
0.01
1285
+0.09
0.04
380
+0.07
0.00
Negative Logits
strick
-0.90
depic
-0.89
impractica
-0.86
fta
-0.85
guarante
-0.85
increa
-0.84
aen
-0.82
greate
-0.82
disagre
-0.82
fup
-0.82
POSITIVE LOGITS
enderror
0.54
drit
0.48
actionMode
0.48
IsMutable
0.48
apunov
0.47
feed
0.47
sabato
0.46
SneakyThrows
0.46
RSSSF
0.45
RICIA
0.45
Activations Density 0.436%