INDEX
Explanations
references to weapons and physical violence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.22
0.8%
1385
+0.10
0.4%
1150
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1385
+0.22
0.06
946
+0.10
0.04
394
+0.10
0.05
Negative Logits
<bos>
-2.40
glan
-1.01
Autoritní
-0.92
realis
-0.90
reger
-0.89
gie
-0.85
stoff
-0.84
anse
-0.84
kram
-0.82
hek
-0.82
POSITIVE LOGITS
impractica
0.94
disreg
0.93
liberality
0.91
ecru
0.90
clayey
0.89
felicity
0.89
friable
0.85
earnestness
0.84
pymysql
0.82
paradiso
0.81
Activations Density 0.350%