INDEX
Explanations
instances of the word "over."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
378
+0.12
0.7%
88
+0.12
0.7%
174
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
88
+0.12
0.03
403
+0.12
0.03
378
+0.12
0.01
Negative Logits
punk
-1.54
Patri
-1.49
esis
-1.44
femin
-1.43
reb
-1.37
Rev
-1.35
negot
-1.33
rendum
-1.31
dyst
-1.29
pro
-1.28
POSITIVE LOGITS
»¿
4.15
±
3.48
½
3.44
´
3.36
↵
3.31
3.31
↵↵
3.31
3.31
↵
3.31
3.31
Activations Density 0.213%