INDEX
Explanations
terms related to laws, regulations, and cultural or religious sensitivities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.26
0.8%
227
+0.10
0.3%
1510
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.26
0.02
1984
+0.10
0.03
227
+0.09
0.03
Negative Logits
embra
-1.77
effe
-1.74
dispen
-1.70
fte
-1.69
nece
-1.68
desir
-1.68
secon
-1.66
unce
-1.63
bordeaux
-1.63
unden
-1.62
POSITIVE LOGITS
'
0.62
’
0.61
itself
0.59
=""/>
0.57
stock
0.55
برابوك
0.54
(":");0.54
]!='
0.54
مكتبه
0.53
מעט
0.53
Activations Density 0.110%