INDEX
Explanations
mentions of political figures and events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.14
0.4%
163
+0.11
0.3%
394
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
163
+0.14
0.03
770
+0.11
0.05
1135
+0.09
0.04
Negative Logits
reluct
-1.64
encomp
-1.63
increa
-1.63
hairc
-1.58
disagre
-1.58
intersper
-1.57
affor
-1.56
perfet
-1.56
shenan
-1.56
inev
-1.56
POSITIVE LOGITS
AfterEach
0.69
helped
0.68
makedirs
0.67
helping
0.67
RemoteException
0.63
ComponentScan
0.62
himself
0.61
abspath
0.61
successfully
0.60
worked
0.60
Activations Density 1.080%