INDEX
Explanations
phrases related to facing difficulties or challenges
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
25
+0.14
0.5%
11
+0.11
0.4%
1325
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
11
+0.14
0.03
25
+0.11
0.03
630
+0.10
0.03
Negative Logits
revan
-0.79
utop
-0.69
OGND
-0.64
priva
-0.64
impon
-0.64
Winf
-0.63
meras
-0.63
sä
-0.63
hoj
-0.62
pessi
-0.61
POSITIVE LOGITS
struggle
1.14
struggles
1.06
Struggle
1.01
struggling
0.97
struggled
0.94
uggles
0.79
stru
0.73
uggling
0.68
uggle
0.64
bandeau
0.63
Activations Density 0.074%