INDEX
Explanations
phrases related to criticism or negative assessment
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1872
+0.12
0.4%
1404
+0.12
0.4%
1385
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.12
0.05
1125
+0.12
0.03
1872
+0.12
0.03
Negative Logits
Whence
-0.70
ypeł
-0.66
peppa
-0.65
Şi
-0.63
alnız
-0.62
withal
-0.61
shenan
-0.61
hairc
-0.60
Mémoires
-0.60
madonna
-0.58
POSITIVE LOGITS
reputa
0.55
grans
0.54
SourceChecksum
0.51
hoj
0.50
dè
0.50
palet
0.49
Ukra
0.49
Gilla
0.49
AsUp
0.48
逅
0.48
Activations Density 0.289%