INDEX
Explanations
mentions or descriptions related to computer security software or hardware
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
1.2%
2019
+0.06
0.4%
1842
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
453
+0.19
0.15
161
+0.06
0.11
1261
+0.04
0.10
Negative Logits
<bos>
-2.53
-0.95
ⓧ
-0.95
/**
-0.95
<?
-0.91
/*
-0.86
<?
-0.79
public
-0.77
/*++
-0.70
#![
-0.68
POSITIVE LOGITS
maneu
1.92
affor
1.77
accla
1.69
impra
1.67
increa
1.66
véhic
1.65
embodi
1.57
reluct
1.57
stockholm
1.57
Juf
1.52
Activations Density 2.165%