INDEX
Explanations
phrases related to impactful actions or events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1264
+0.10
0.3%
1416
+0.10
0.3%
674
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1264
+0.10
0.03
1492
+0.10
0.03
1416
+0.10
0.03
Negative Logits
himo
-0.49
tev
-0.47
mikrofon
-0.47
亘
-0.45
WAL
-0.44
encodeWith
-0.43
RUS
-0.43
DOUT
-0.43
Craw
-0.42
Craw
-0.42
POSITIVE LOGITS
strike
1.29
struck
1.17
strikes
1.16
strike
1.10
Strike
1.03
Strikes
1.03
striking
1.02
Strike
1.00
Striking
0.90
struck
0.83
Activations Density 0.071%