INDEX
Explanations
phrases that indicate occurrences and references to specific events or actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
430
+0.13
0.7%
95
+0.12
0.7%
235
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
170
+0.13
0.04
362
+0.12
0.04
160
+0.11
0.05
Negative Logits
ĨĴ
-2.65
¿½
-2.58
³
-2.57
´
-2.52
ł
-2.46
ĸ´
-2.46
·¸
-2.45
ī
-2.42
Ĭ
-2.39
ı
-2.35
POSITIVE LOGITS
wives
1.68
(`
1.60
sender
1.56
schemas
1.47
board
1.46
purpose
1.46
purposes
1.41
poser
1.40
medium
1.40
constructor
1.39
Activations Density 0.252%