INDEX
Explanations
phrases related to challenging situations or ideas
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
411
+0.15
0.6%
120
+0.13
0.5%
1392
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
411
+0.15
0.04
1077
+0.13
0.04
1052
+0.13
0.03
Negative Logits
jouant
-0.64
évit
-0.62
ricardo
-0.60
récomp
-0.59
montrant
-0.58
javier
-0.58
sappi
-0.58
Satt
-0.57
Voilà
-0.57
Winf
-0.56
POSITIVE LOGITS
challenge
1.35
challenge
1.25
challenges
1.24
Challenge
1.22
Challenge
1.11
challenges
1.10
challenged
1.08
CHALLENGE
1.08
Challenges
1.07
challenged
1.06
Activations Density 0.099%