INDEX
Explanations
instances of the word "presents."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.22
1.2%
148
+0.15
0.8%
115
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
338
+0.22
0.01
372
+0.15
0.01
74
+0.14
0.01
Negative Logits
<%=
-1.53
JavaScript
-1.49
ived
-1.49
liche
-1.46
s
-1.44
ober
-1.43
wik
-1.41
reputation
-1.40
romes
-1.39
↵³³
-1.37
POSITIVE LOGITS
cents
1.76
chool
1.75
ystems
1.69
omal
1.69
cent
1.64
omerase
1.59
ystem
1.58
ground
1.50
ational
1.49
addle
1.46
Activations Density 0.889%