INDEX
Explanations
positive mentions and support from community-related contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
764
+0.11
0.3%
478
+0.09
0.3%
227
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.11
0.10
227
+0.09
0.08
1097
+0.09
0.07
Negative Logits
algunos
-0.58
Çünkü
-0.55
despotism
-0.54
برابوك
-0.53
repug
-0.52
Przyp
-0.51
žnost
-0.51
Dzięki
-0.51
lenger
-0.50
Tampoco
-0.49
POSITIVE LOGITS
<eos>
0.78
abbra
0.72
profi
0.70
anonyme
0.67
WebElementEntity
0.62
emi
0.61
mef
0.61
sark
0.61
tutt
0.60
évé
0.58
Activations Density 0.626%