INDEX
Explanations
phrases related to internet scams and hoaxes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
966
+0.14
0.8%
699
+0.12
0.7%
1101
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
699
+0.14
0.04
966
+0.12
0.03
144
+0.11
0.03
Negative Logits
<bos>
-1.91
dobbiamo
-0.71
Ngoài
-0.68
stiamo
-0.67
succede
-0.67
ContentAlignment
-0.61
facciamo
-0.61
bibnamefont
-0.60
voglio
-0.60
Chúng
-0.59
POSITIVE LOGITS
Ho
1.31
maneu
1.26
indestru
1.22
Ho
1.20
héro
1.19
renfer
1.16
prétend
1.12
hcm
1.11
confé
1.11
disreg
1.11
Activations Density 0.244%