INDEX
Explanations
references to communities or communism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
503
+0.18
1.0%
410
+0.15
0.9%
343
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
111
+0.18
0.07
271
+0.15
0.02
156
+0.14
0.04
Negative Logits
ľ
-2.38
č↵č↵
-2.28
-2.28
↵
-2.28
<|outofrange|>
-2.28
č↵č↵
-2.28
-2.28
↵ ↵
-2.28
↵
-2.28
↵
-2.28
POSITIVE LOGITS
icable
1.84
ITED
1.84
shot
1.73
imental
1.72
ICT
1.66
ications
1.63
ication
1.60
ICES
1.59
ision
1.52
icator
1.51
Activations Density 0.245%