INDEX
Explanations
mentions of espionage-related terms and laws
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
597
+0.18
1.0%
1805
+0.16
0.9%
871
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1805
+0.18
0.03
597
+0.16
0.03
871
+0.14
0.02
Negative Logits
<bos>
-1.52
/**
-0.82
-0.79
<?
-0.78
endeavoured
-0.74
ⓧ
-0.68
endow
-0.68
chanced
-0.67
harmonize
-0.67
endeavored
-0.67
POSITIVE LOGITS
Esp
1.14
Esp
1.06
Sp
0.94
Sp
0.93
ESP
0.92
Spe
0.91
ESP
0.89
esp
0.89
SP
0.84
esp
0.83
Activations Density 0.087%