INDEX
Explanations
numerical information and statistics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
24
+0.07
0.2%
468
+0.07
0.2%
277
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
345
+0.07
0.02
277
+0.07
0.02
1968
+0.07
0.02
Negative Logits
encomp
-1.21
impra
-1.19
increa
-1.18
attemp
-1.17
erad
-1.16
depic
-1.13
affor
-1.13
maneu
-1.12
shenan
-1.12
inappro
-1.10
POSITIVE LOGITS
luck
0.87
<bos>
0.78
happen
0.76
lucky
0.71
happened
0.70
circumstances
0.70
happens
0.68
happening
0.67
unexpected
0.66
fate
0.63
Activations Density 0.625%