INDEX
Explanations
the term "lex" in various contexts, indicating a focus on lexical or language-related concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
364
+0.14
0.8%
320
+0.14
0.8%
369
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
105
+0.14
0.01
320
+0.14
0.01
364
+0.13
0.01
Negative Logits
else
-2.21
inner
-1.65
res
-1.59
"?"
-1.47
illed
-1.46
taste
-1.46
odot
-1.45
)){-1.42
suppose
-1.42
;&
-1.41
POSITIVE LOGITS
enos
1.80
igent
1.73
ibase
1.71
imab
1.70
cores
1.68
ig
1.67
plicit
1.65
ball
1.62
volt
1.57
ibo
1.56
Activations Density 0.013%