INDEX
Explanations
terms related to deterring actions or consequences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1328
+0.18
0.8%
1604
+0.13
0.6%
757
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1328
+0.18
0.03
1604
+0.13
0.03
757
+0.12
0.03
Negative Logits
<bos>
-2.02
DataPropertyName
-0.70
/***
-0.63
<?
-0.61
дописавши
-0.61
PrintStream
-0.61
/*!
-0.60
superintend
-0.60
mAdapter
-0.59
Román
-0.59
POSITIVE LOGITS
deterrent
0.92
deterred
0.86
tucson
0.84
riva
0.82
borsa
0.80
arture
0.80
deterrence
0.80
relenting
0.80
tph
0.80
leuth
0.79
Activations Density 0.146%