INDEX
Explanations
the word "is" used in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.40
2.4%
478
+0.17
1.0%
337
+0.15
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
337
+0.40
0.24
17
+0.17
0.16
233
+0.15
0.17
Negative Logits
č↵
-4.02
↵
-4.02
↵↵
-4.02
-4.02
↵
-4.02
↵↵
-4.02
č↵
-4.02
-4.02
-4.02
<|outofrange|>
-4.02
POSITIVE LOGITS
rael
1.89
expected
1.82
opropyl
1.81
increasing
1.71
indicated
1.68
decreasing
1.67
omerase
1.64
caspase
1.62
unfair
1.61
ocrates
1.59
Activations Density 6.977%