INDEX
Explanations
mentions of locations, settings, and directions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
62
+0.10
0.3%
190
+0.09
0.2%
1319
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.10
0.06
2044
+0.09
0.06
510
+0.08
0.05
Negative Logits
dises
-1.20
accla
-1.16
sappi
-1.14
Keny
-1.11
peculi
-1.08
volunte
-1.08
emphat
-1.07
Cik
-1.06
parteci
-1.06
inev
-1.06
POSITIVE LOGITS
<bos>
0.93
we
0.80
please
0.76
wohin
0.72
,
0.71
you
0.68
zumal
0.66
weshalb
0.65
的话
0.65
consider
0.64
Activations Density 0.385%