INDEX
Explanations
political and policy-related terms and concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.21
0.6%
604
+0.21
0.6%
184
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1369
+0.21
0.02
1499
+0.21
0.06
1519
+0.14
0.04
Negative Logits
reluct
-1.14
embodi
-1.14
yoda
-1.14
swarovski
-1.13
hentai
-1.12
milf
-1.07
wikihow
-1.06
perfet
-1.06
hasbro
-1.05
cvt
-1.05
POSITIVE LOGITS
RunAsync
0.61
TableField
0.60
NameValuePair
0.60
ListItemIcon
0.58
JsonFormat
0.58
FailureListener
0.58
JdbcTemplate
0.57
Begriffsklä
0.57
ileg
0.55
RatingBar
0.55
Activations Density 0.578%