INDEX
Explanations
phrases related to instructions or directives starting with verbs like "to hide", "to be", "to go"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.09
0.3%
314
+0.08
0.2%
869
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
595
+0.09
0.04
1667
+0.08
0.03
1996
+0.08
0.02
Negative Logits
PYX
-0.58
Duisburg
-0.58
siven
-0.51
Donau
-0.51
proteine
-0.50
allclose
-0.50
Bielefeld
-0.49
Schloß
-0.49
rativo
-0.48
TypedDataSet
-0.48
POSITIVE LOGITS
unce
0.75
indor
0.71
soggior
0.71
hoeft
0.68
impractica
0.68
GYPT
0.67
suscep
0.66
aarr
0.66
embodi
0.65
oodoo
0.65
Activations Density 0.188%