INDEX
Explanations
security-related terms and parameters in technical documents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
356
+0.13
0.7%
485
+0.13
0.7%
58
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
36
+0.13
0.11
209
+0.13
0.09
495
+0.12
0.08
Negative Logits
uncher
-1.80
rypt
-1.63
aser
-1.61
ikk
-1.59
ikt
-1.54
gom
-1.52
åύ
-1.52
enario
-1.41
inen
-1.40
ismus
-1.38
POSITIVE LOGITS
leading
1.56
jurisdiction
1.51
fully
1.50
eful
1.49
Respondent
1.44
leads
1.43
Category
1.40
dles
1.36
elves
1.35
asc
1.34
Activations Density 0.107%