INDEX
Explanations
text indicating the importance of something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
168
+0.13
0.4%
196
+0.12
0.4%
1527
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
168
+0.13
0.05
1372
+0.12
0.05
240
+0.11
0.04
Negative Logits
tenté
-0.67
excès
-0.65
oeil
-0.63
écart
-0.59
rendono
-0.58
saad
-0.58
endeavouring
-0.58
alip
-0.57
suna
-0.56
maksi
-0.56
POSITIVE LOGITS
important
0.84
importance
0.83
Important
0.81
important
0.79
Important
0.71
Importance
0.70
importance
0.70
importants
0.68
Importance
0.66
IMPORTANT
0.65
Activations Density 0.140%