INDEX
Explanations
expressions indicating a state of being or presence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
466
+0.15
0.9%
144
+0.15
0.8%
386
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
228
+0.15
0.02
144
+0.15
0.02
466
+0.12
0.02
Negative Logits
©
-2.38
ĥ½
-2.02
¦
-2.02
¨
-2.01
£
-1.88
ij
-1.82
ľĵ
-1.70
¤
-1.69
Į
-1.68
hered
-1.60
POSITIVE LOGITS
hundred
1.68
emon
1.67
cker
1.59
million
1.54
eger
1.50
igos
1.47
mounts
1.47
cki
1.43
yman
1.42
ymoon
1.42
Activations Density 0.131%