INDEX
Explanations
references to legal cases, political statistics, and controversial government actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
764
+0.23
0.7%
1108
+0.15
0.5%
1177
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
764
+0.23
0.02
227
+0.15
0.03
964
+0.14
0.02
Negative Logits
Février
-0.66
Noice
-0.65
Hahah
-0.64
etui
-0.61
FTFY
-0.61
Tbh
-0.61
Wtf
-0.60
Rgds
-0.60
LMAO
-0.59
Ottobre
-0.58
POSITIVE LOGITS
Shakspeare
0.57
lii
0.55
cance
0.52
rospy
0.51
thut
0.49
AFFIRMED
0.46
isContained
0.45
enterOuterAlt
0.45
fath
0.43
mbda
0.43
Activations Density 0.095%