INDEX
Explanations
mentions of specific names, especially related to political scandals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
101
+0.17
0.8%
1339
+0.15
0.7%
1778
+0.14
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.17
0.05
1516
+0.15
0.03
227
+0.14
0.05
Negative Logits
Ké
-0.53
lift
-0.52
-0.49
uatu
-0.49
Ralph
-0.47
Cos
-0.47
Ralph
-0.47
Sheffield
-0.43
Marvel
-0.43
Cos
-0.42
POSITIVE LOGITS
steven
0.96
STEVEN
0.88
Nixon
0.88
steven
0.81
Nixon
0.78
Steven
0.74
Steven
0.73
Whence
0.70
Stevenson
0.66
pavillon
0.66
Activations Density 0.275%