INDEX
Explanations
phrases related to political and social issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.10
0.3%
776
+0.10
0.3%
919
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.10
0.08
1415
+0.10
0.04
1510
+0.07
0.04
Negative Logits
swarovski
-1.36
fluo
-1.34
wien
-1.30
bordeaux
-1.26
stockholm
-1.24
olx
-1.22
lyon
-1.22
effe
-1.22
mef
-1.21
eiffel
-1.21
POSITIVE LOGITS
have
0.98
has
0.97
hasn
0.87
haven
0.86
has
0.76
have
0.72
hath
0.72
đã
0.69
telah
0.67
Have
0.65
Activations Density 0.337%