INDEX
Explanations
phrases related to physical restraint and confinement
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1385
+0.12
0.4%
416
+0.11
0.3%
683
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
416
+0.12
0.05
736
+0.11
0.05
1352
+0.09
0.04
Negative Logits
palab
-0.87
lele
-0.86
kram
-0.85
hina
-0.84
alkoh
-0.83
lemp
-0.83
haup
-0.83
vito
-0.83
gius
-0.82
doman
-0.82
POSITIVE LOGITS
tight
0.71
binding
0.69
binds
0.67
rope
0.66
tighter
0.65
restraints
0.64
tightened
0.63
straps
0.63
tying
0.63
ropes
0.63
Activations Density 0.260%