INDEX
Explanations
text related to linguistic theory and grammatical principles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1034
+0.18
1.0%
966
+0.16
0.9%
1437
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1870
+0.18
0.01
1363
+0.16
0.03
966
+0.14
0.02
Negative Logits
<bos>
-1.07
ⓧ
-0.92
iddhar
-0.81
Mlle
-0.78
quitted
-0.77
/**
-0.75
hentai
-0.75
disambigu
-0.73
shenan
-0.71
gild
-0.67
POSITIVE LOGITS
parameter
1.22
parameter
1.20
Parameter
1.16
Parameter
1.14
param
1.11
parameters
1.11
Param
1.11
Param
1.09
parameters
1.08
PARAM
1.04
Activations Density 0.411%