INDEX
Explanations
restrictions or prohibitions related to governmental or societal policies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
460
+0.08
0.2%
1977
+0.08
0.2%
74
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
919
+0.08
0.03
1499
+0.08
0.06
74
+0.07
0.04
Negative Logits
pessi
-1.08
fluo
-1.03
dand
-0.99
casio
-0.98
?...
-0.98
uniqu
-0.96
desir
-0.96
alre
-0.95
overla
-0.95
embra
-0.95
POSITIVE LOGITS
imposed
0.96
restrictions
0.83
imposed
0.77
enforced
0.72
restriction
0.70
restricting
0.67
enforcement
0.63
restrict
0.62
restrictive
0.62
restrictions
0.61
Activations Density 0.436%