INDEX
Explanations
mentions of fictional battles and confrontations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
764
+0.23
0.7%
604
+0.14
0.4%
906
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
764
+0.23
0.05
736
+0.14
0.05
509
+0.11
0.04
Negative Logits
<bos>
-1.19
tenda
-0.78
Попис
-0.76
ados
-0.74
fides
-0.72
roba
-0.71
poliuret
-0.70
lampa
-0.67
sement
-0.66
roj
-0.65
POSITIVE LOGITS
McLaugh
1.06
McInt
0.99
disreg
0.99
Vaugh
0.95
reluct
0.94
Rine
0.94
unspeak
0.92
Gorb
0.90
indestru
0.90
impra
0.90
Activations Density 0.244%