INDEX
Explanations
phrases related to military experiences and medical procedures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
394
+0.22
0.7%
1177
+0.12
0.4%
50
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.22
0.11
1013
+0.12
0.12
394
+0.10
0.08
Negative Logits
ló
-0.84
fú
-0.79
tenda
-0.75
rú
-0.74
meras
-0.69
ortop
-0.69
fono
-0.68
quí
-0.68
trá
-0.67
graus
-0.67
POSITIVE LOGITS
unwarran
1.21
unspeak
1.18
Shakspeare
1.15
disagre
1.13
milf
1.12
tolerably
1.10
apprehen
1.10
snoopy
1.09
shenan
1.08
hentai
1.06
Activations Density 1.485%