INDEX
Explanations
instances of specific words related to permissions and prohibitions, especially regarding the use of content for personal or commercial purposes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.21
1.3%
528
+0.15
0.9%
662
+0.11
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
32
+0.21
0.07
528
+0.15
0.07
662
+0.11
0.07
Negative Logits
<bos>
-3.77
/***
-0.69
Kontrola
-0.67
/*++
-0.61
nahilalakip
-0.60
بالإنجليزية
-0.59
っこう
-0.57
Vegeu
-0.56
Более
-0.56
springfox
-0.56
POSITIVE LOGITS
affor
1.68
accla
1.64
eiffel
1.63
increa
1.58
stockholm
1.58
impra
1.56
madonna
1.56
toledo
1.56
disagre
1.54
reluct
1.51
Activations Density 0.199%