INDEX
Explanations
instances of the word "comprises."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.21
1.2%
23
+0.18
1.0%
115
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
37
+0.21
0.01
316
+0.18
0.01
266
+0.13
0.01
Negative Logits
ÅĽÄĩ
-1.58
ÃŃst
-1.53
istics
-1.48
served
-1.46
amento
-1.42
edo
-1.41
ÅĽci
-1.41
's
-1.41
books
-1.40
cited
-1.39
POSITIVE LOGITS
imal
1.63
veins
1.53
inib
1.50
>=
1.41
pity
1.41
imet
1.39
inability
1.39
kill
1.38
omit
1.38
dimethyl
1.37
Activations Density 0.030%