INDEX
Explanations
numbers in a specific format: a decimal point followed by one or two digits
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.20
0.6%
674
+0.15
0.5%
382
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.20
0.04
1607
+0.15
0.03
1429
+0.13
0.03
Negative Logits
shenan
-1.30
unspeak
-1.24
intersper
-1.20
reluct
-1.16
maneu
-1.16
impra
-1.13
indescri
-1.08
disreg
-1.08
horrend
-1.08
unavoid
-1.06
POSITIVE LOGITS
habet
0.96
marte
0.94
alkoh
0.91
potest
0.89
utop
0.88
ananas
0.85
solidar
0.85
manuten
0.83
pól
0.83
bambu
0.82
Activations Density 0.052%