INDEX
Explanations
references to child exploitation and abuse
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
87
+0.13
0.7%
359
+0.12
0.7%
132
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
18
+0.13
0.02
13
+0.12
0.01
339
+0.12
0.01
Negative Logits
hin
-1.57
HECK
-1.53
entirety
-1.47
UK
-1.43
Compat
-1.39
ogether
-1.39
forthcoming
-1.39
../../
-1.37
PHP
-1.36
heed
-1.35
POSITIVE LOGITS
ytes
1.75
liography
1.68
s
1.65
dle
1.65
lic
1.60
th
1.59
ilic
1.59
dling
1.51
ortune
1.51
obic
1.50
Activations Density 0.087%