INDEX
Explanations
phrases indicating a selection or offer for products or services
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.22
1.4%
241
+0.12
0.7%
893
+0.11
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
893
+0.22
0.09
241
+0.12
0.09
1222
+0.11
0.08
Negative Logits
<bos>
-3.28
ⓧ
-1.14
/***
-0.93
/**
-0.76
дописавши
-0.74
<?
-0.69
#![
-0.69
/*
-0.69
-0.68
/*!
-0.67
POSITIVE LOGITS
lele
1.25
maroc
1.15
jawa
1.09
bandung
1.08
fua
1.02
alpes
1.02
sembl
1.00
riva
0.95
mef
0.95
vinci
0.95
Activations Density 0.241%