INDEX
Explanations
phrases related to falsehoods or inaccuracies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
596
+0.11
0.4%
680
+0.11
0.4%
2004
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
680
+0.11
0.02
36
+0.11
0.02
257
+0.10
0.02
Negative Logits
HttpPut
-0.48
Muffins
-0.46
<0xDF>
-0.46
للاسماء
-0.45
addCriterion
-0.44
راسیون
-0.43
HPP
-0.43
IsUnicode
-0.42
HttpDelete
-0.41
Dif
-0.41
POSITIVE LOGITS
maksi
0.87
false
0.81
kasa
0.78
jati
0.78
false
0.77
False
0.77
Jasa
0.74
seksi
0.72
Palembang
0.71
lemp
0.71
Activations Density 0.061%