INDEX
Explanations
phrases related to strategic plans or actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
1.2%
874
+0.14
0.9%
169
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
874
+0.19
0.05
169
+0.14
0.04
1133
+0.10
0.03
Negative Logits
<bos>
-3.24
ⓧ
-0.85
<?
-0.79
/***
-0.79
/**
-0.76
-0.74
<?
-0.73
//---
-0.73
///**
-0.69
/*
-0.61
POSITIVE LOGITS
affor
1.34
Juf
1.20
unwarran
1.20
maneu
1.19
increa
1.19
suscep
1.16
disagre
1.16
impra
1.15
scrat
1.15
Plans
1.15
Activations Density 0.085%