INDEX
Explanations
information related to rewards, achievements, and benefits
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
687
+0.17
0.6%
1984
+0.14
0.5%
752
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
687
+0.17
0.06
1984
+0.14
0.05
331
+0.13
0.04
Negative Logits
rlrl
-0.67
<bos>
-0.67
làm
-0.58
mặt
-0.58
bạn
-0.58
ukunft
-0.57
Xuất
-0.55
oltán
-0.55
ondag
-0.53
Làm
-0.53
POSITIVE LOGITS
WITH
0.64
excès
0.63
gnition
0.61
uests
0.60
enuine
0.55
antity
0.54
AVEC
0.54
With
0.54
CALIFORN
0.53
WITH
0.52
Activations Density 0.252%