INDEX
Explanations
politically related phrases and names
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
0.6%
752
+0.17
0.5%
1967
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.18
0.04
16
+0.17
0.04
50
+0.11
0.04
Negative Logits
Ikr
-0.81
viciss
-0.79
!!</
-0.77
;;)
-0.73
/**
-0.72
jajaja
-0.70
fufficient
-0.69
shewn
-0.68
Hahah
-0.67
Lmfao
-0.66
POSITIVE LOGITS
Etimo
0.49
...
0.47
İş
0.46
its
0.44
Revenir
0.44
+#+
0.44
Iain
0.43
minente
0.43
leveland
0.43
Çok
0.42
Activations Density 0.178%