INDEX
Explanations
various forms of the phrase "figure out."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
159
+0.13
0.7%
252
+0.12
0.7%
293
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
367
+0.13
0.03
287
+0.12
0.04
252
+0.12
0.03
Negative Logits
ĻĤ
-3.84
ľ
-3.44
Ķ
-3.28
§
-3.28
´
-3.10
ħ
-3.09
ŀ
-3.09
Īĺ
-3.08
ľĵ
-3.05
¨
-3.02
POSITIVE LOGITS
how
2.01
why
1.99
what
1.49
whats
1.48
manually
1.48
lay
1.42
qué
1.41
cry
1.39
haviour
1.38
wrong
1.37
Activations Density 0.228%