INDEX
Explanations
mentions of legislative actions or decision-making processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.19
0.6%
227
+0.16
0.5%
1842
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.19
0.08
227
+0.16
0.07
1343
+0.13
0.06
Negative Logits
ContentAlignment
-0.66
ISupport
-0.66
UnsafeEnabled
-0.65
TagMode
-0.63
esteld
-0.62
InputDecoration
-0.60
tres
-0.60
CloseOperation
-0.60
setOnAction
-0.59
UIControlState
-0.59
POSITIVE LOGITS
ftu
2.12
disagre
2.05
fuf
2.04
milf
1.99
fta
1.96
vns
1.94
ftre
1.94
depic
1.89
desir
1.89
hentai
1.89
Activations Density 0.294%