INDEX
Explanations
mentions of physical injuries or scars
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
468
+0.10
0.3%
426
+0.09
0.3%
1041
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
736
+0.10
0.06
468
+0.09
0.03
426
+0.09
0.03
Negative Logits
utop
-0.98
robus
-0.93
siff
-0.92
solidar
-0.89
bascul
-0.89
incess
-0.88
meras
-0.87
parati
-0.87
geolog
-0.85
spion
-0.84
POSITIVE LOGITS
scars
0.71
marks
0.61
inflicted
0.60
loopt
0.58
caused
0.57
szüks
0.57
visible
0.54
scarred
0.53
registrada
0.52
formed
0.52
Activations Density 0.272%